Chen & S. Mani; JMLR W&CP 16:113–126, 2011.
Active Learning for Unbalanced Data in the Challenge with Multiple Models and Biasing
The common uncertain sampling approach searches for the most uncertain samples
closest to the decision boundary for a classiﬁcation task. However, we might fail to ﬁnd the
uncertain samples when we have a poor probabilistic model. In this work, we develop
an active learning strategy called “Uncertainty Sampling with Biasing Consensus”
(USBC) which predicts the unbalanced data by multi-model committee and ranks the
informativeness of samples by uncertainty sampling with higher weight on the minority
class. For prediction, we use Random Forests based multiple models that generate the
consensus posterior probability for each sample as part of USBC. To further improve the
initial performance in active learning, we also use a semi-supervised learning model
that self labels predicted negative samples without querying. For more stable initial
performance, we use a ﬁlter to avoid querying samples with high variance. We also introduce
batch size validation to ﬁnd the optimal initial batch size for querying samples in active
Page last modified on Wed Mar 30 11:10:06 2011.