Inspecting Sample Reusability for Active Learning

K. Tomanek & K. Morik; JMLR W&CP 16:169–181, 2011.

Abstract

Active Learning (AL) exploits a learning algorithm to selectively sample examples which are expected to be highly useful for model learning. The resulting sample is governed by a sampling selection bias. While a bias towards useful examples is desirable, there is also a bias towards the learner applied during AL selection. This paper addresses sample reusability, i.e., the question whether and under which conditions samples selected by AL using one learning algorithm are well-suited as training data for another learning algorithm.

Our empirical investigation on general classification problems as well as the natural language processing subtask of Named Entity Recognition shows that many intuitive assumptions on reusability characteristics do not hold. For example, using the same algorithm during AL selection (called selector) and for inducing the final model (called consumer) is not always the optimal choice. We investigate several putatively explanatory factors for sample reusability. One finding is that the suitability of certain selector-consumer pairings cannot be estimated independently of the actual learning problem.


Page last modified on Wed Mar 30 11:10:43 2011.



Home Page

Papers

Submissions

News

Scope

Editorial Board

Announcements

Proceedings

Open Source Software

Search

Login



RSS Feed