## Baseline Methods for Active Learning

G.C. Cawley; JMLR W&CP 16:47–57,
2011.

### Abstract

In many potential applications of machine learning, unlabelled data are abundantly
available at low cost, but there is a paucity of labelled data, and labeling unlabelled examples is
expensive and/or time-consuming. This motivates the development of active learning methods,
that seek to direct the collection of labelled examples such that the greatest performance gains
can be achieved using the smallest quantity of labelled data. In this paper, we describe some
simple pool-based active learning strategies, based on optimally regularised linear [kernel] ridge
regression, providing a set of baseline submissions for the Active Learning Challenge. A
simple random strategy, where unlabelled patterns are submitted to the oracle purely at
random, is found to be surprisingly eﬀective, being competitive with more complex
approaches.

Page last modified on Wed Mar 30 11:09:12 2011.