Date of this Version
Active learning is widely used to select which examples from a pool should be labeled to give best results when learning predictive models. It is, however, sometimes desirable to choose examples before any labeling or machine learning has occurred. The optimal experimental design literature has many theoretically attractive optimality criteria for example selection, but most are intractable when working with large numbers of predictive features. We present the BaBiES criterion, an approximation of Bayesian A-optimal design for linear regression using binary predictors, which is both simple and extremely fast. Empirical evaluations demonstrate that, in spite of selecting all examples prior to learning, BaBiES is competitive with standard active learning methods for a variety of document classification tasks.
Date Posted: 19 May 2005