## Exploiting Product Distributions to Identify Relevant Variables of Correlation Immune Functions

** Lisa Hellerstein, Bernard Rosell, Eric Bach, Soumya Ray, David Page**; 10(83):2375−2411, 2009.

### Abstract

A Boolean function *f* is *correlation immune* if each input
variable is independent of the output, under the uniform distribution
on inputs. For example, the parity function is correlation immune.
We consider the problem of identifying relevant variables of a
correlation immune function, in the presence of irrelevant variables.
We address this problem in two different contexts. First, we analyze
*Skewing*, a heuristic method that was developed to improve the
ability of greedy decision tree algorithms to identify relevant
variables of correlation immune Boolean functions, given examples
drawn from the uniform distribution (Page and Ray, 2003). We present
theoretical results revealing both the capabilities and limitations of
skewing. Second, we explore the problem of identifying relevant
variables in the *Product Distribution Choice* (PDC) learning
model, a model in which the learner can choose product distributions
and obtain examples from them. We prove a lemma establishing a
property of Boolean functions that may be of independent interest.
Using this lemma, we give two new algorithms for finding relevant
variables of correlation immune functions in the PDC model.

© JMLR 2009. (edit, beta) |