Exploiting tree-based variable importances to selectively identify relevant variables
Vân Anh Huynh-Thu, Louis Wehenkel, Pierre Geurts;
JMLR W&P 4:60-73, 2008.
Abstract
This paper proposes a novel statistical procedure based on
permutation tests for extracting a subset of truly relevant
variables from multivariate importance rankings derived from
tree-based supervised learning methods. It shows also that the
direct extension of the classical approach based on permutation
tests for estimating false discovery rates of univariate variable
scoring procedures does not extend very well to the case of
multivariate tree-based importance measures.