Sample Complexity Bounds for Differentially Private Learning

Kamalika Chaudhuri; Daniel Hsu

Sample Complexity Bounds for Differentially Private Learning

Kamalika Chaudhuri, Daniel Hsu

Proceedings of the 24th Annual Conference on Learning Theory, PMLR 19:155-186, 2011.

Abstract

This work studies the problem of privacy-preserving classification – namely, learning a classifier from sensitive data while preserving the privacy of individuals in the training set. In particular, the learning algorithm is required in this problem to guarantee differential privacy, a very strong notion of privacy that has gained significant attention in recent years. A natural question to ask is: what is the sample requirement of a learning algorithm that guarantees a certain level of privacy and accuracy? We address this question in the context of learning with infinite hypothesis classes when the data is drawn from a continuous distribution. We first show that even for very simple hypothesis classes, any algorithmth at uses a finite number of examples and guarantees differential privacy must fail to return an accurate classifier for at least some unlabeled data distributions. This result is unlike the case with either finite hypothesis classes or discrete data domains, in which distribution-free private learning is possible, as previously shown by Kasiviswanathan et al. (2008). We then consider two approaches to differentially private learning that get around this lower bound. The first approach is to use prior knowledge about the unlabeled data distribution in the form of a reference distribution $\mathcal{U}$ chosen independently of the sensitive data. Given such a reference $\mathcal{U}$, we provide an upper bound on the sample requirement that depends (among other things) on a measure of closeness between $\mathcal{U}$ and the unlabeled data distribution. Our upper bound applies to the non-realizable as well as the realizable case. The second approach is to relax the privacy requirement, by requiring only label-privacy –namely, that the only labels (and not the unlabeled parts of the examples)be considered sensitive information. An upper bound on the sample requirement of learning with label privacy was shown by Chaudhuri et al. (2006); in this work, we show a lower bound.

Cite this Paper

BibTeX


@InProceedings{pmlr-v19-chaudhuri11a,
  title = 	 {Sample Complexity Bounds for Differentially Private Learning},
  author = 	 {Chaudhuri, Kamalika and Hsu, Daniel},
  booktitle = 	 {Proceedings of the 24th Annual Conference on Learning Theory},
  pages = 	 {155--186},
  year = 	 {2011},
  editor = 	 {Kakade, Sham M. and von Luxburg, Ulrike},
  volume = 	 {19},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Budapest, Hungary},
  month = 	 {09--11 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v19/chaudhuri11a/chaudhuri11a.pdf},
  url = 	 {https://proceedings.mlr.press/v19/chaudhuri11a.html},
  abstract = 	 {This work studies the problem of privacy-preserving classification – namely, learning a classifier from sensitive data while preserving the privacy of individuals in the training set. In particular, the learning algorithm is required in this problem to guarantee differential privacy, a very strong notion of privacy that has gained significant attention in recent years. A natural question to ask is: what is the sample requirement of a learning algorithm that guarantees a certain level of privacy and accuracy? We address this question in the context of learning with infinite hypothesis classes when the data is drawn from a continuous distribution. We first show that even for very simple hypothesis classes, any algorithmth at uses a finite number of examples and guarantees differential privacy must fail to return an accurate classifier for at least some unlabeled data distributions. This result is unlike the case with either finite hypothesis classes or discrete data domains, in which distribution-free private learning is possible, as previously shown by Kasiviswanathan et al. (2008). We then consider two approaches to differentially private learning that get around this lower bound. The first approach is to use prior knowledge about the unlabeled data distribution in the form of a reference distribution $\mathcal{U}$ chosen independently of the sensitive data. Given such a reference $\mathcal{U}$, we provide an upper bound on the sample requirement that depends (among other things) on a measure of closeness between $\mathcal{U}$ and the unlabeled data distribution. Our upper bound applies to the non-realizable as well as the realizable case. The second approach is to relax the privacy requirement, by requiring only label-privacy –namely, that the only labels (and not the unlabeled parts of the examples)be considered sensitive information. An upper bound on the sample requirement of learning with label privacy was shown by Chaudhuri et al. (2006); in this work, we show a lower bound.}
}

Endnote

%0 Conference Paper
%T Sample Complexity Bounds for Differentially Private Learning
%A Kamalika Chaudhuri
%A Daniel Hsu
%B Proceedings of the 24th Annual Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2011
%E Sham M. Kakade
%E Ulrike von Luxburg	
%F pmlr-v19-chaudhuri11a
%I PMLR
%P 155--186
%U https://proceedings.mlr.press/v19/chaudhuri11a.html
%V 19
%X This work studies the problem of privacy-preserving classification – namely, learning a classifier from sensitive data while preserving the privacy of individuals in the training set. In particular, the learning algorithm is required in this problem to guarantee differential privacy, a very strong notion of privacy that has gained significant attention in recent years. A natural question to ask is: what is the sample requirement of a learning algorithm that guarantees a certain level of privacy and accuracy? We address this question in the context of learning with infinite hypothesis classes when the data is drawn from a continuous distribution. We first show that even for very simple hypothesis classes, any algorithmth at uses a finite number of examples and guarantees differential privacy must fail to return an accurate classifier for at least some unlabeled data distributions. This result is unlike the case with either finite hypothesis classes or discrete data domains, in which distribution-free private learning is possible, as previously shown by Kasiviswanathan et al. (2008). We then consider two approaches to differentially private learning that get around this lower bound. The first approach is to use prior knowledge about the unlabeled data distribution in the form of a reference distribution $\mathcal{U}$ chosen independently of the sensitive data. Given such a reference $\mathcal{U}$, we provide an upper bound on the sample requirement that depends (among other things) on a measure of closeness between $\mathcal{U}$ and the unlabeled data distribution. Our upper bound applies to the non-realizable as well as the realizable case. The second approach is to relax the privacy requirement, by requiring only label-privacy –namely, that the only labels (and not the unlabeled parts of the examples)be considered sensitive information. An upper bound on the sample requirement of learning with label privacy was shown by Chaudhuri et al. (2006); in this work, we show a lower bound.

RIS


TY  - CPAPER
TI  - Sample Complexity Bounds for Differentially Private Learning
AU  - Kamalika Chaudhuri
AU  - Daniel Hsu
BT  - Proceedings of the 24th Annual Conference on Learning Theory
DA  - 2011/12/21
ED  - Sham M. Kakade
ED  - Ulrike von Luxburg	
ID  - pmlr-v19-chaudhuri11a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 19
SP  - 155
EP  - 186
L1  - http://proceedings.mlr.press/v19/chaudhuri11a/chaudhuri11a.pdf
UR  - https://proceedings.mlr.press/v19/chaudhuri11a.html
AB  - This work studies the problem of privacy-preserving classification – namely, learning a classifier from sensitive data while preserving the privacy of individuals in the training set. In particular, the learning algorithm is required in this problem to guarantee differential privacy, a very strong notion of privacy that has gained significant attention in recent years. A natural question to ask is: what is the sample requirement of a learning algorithm that guarantees a certain level of privacy and accuracy? We address this question in the context of learning with infinite hypothesis classes when the data is drawn from a continuous distribution. We first show that even for very simple hypothesis classes, any algorithmth at uses a finite number of examples and guarantees differential privacy must fail to return an accurate classifier for at least some unlabeled data distributions. This result is unlike the case with either finite hypothesis classes or discrete data domains, in which distribution-free private learning is possible, as previously shown by Kasiviswanathan et al. (2008). We then consider two approaches to differentially private learning that get around this lower bound. The first approach is to use prior knowledge about the unlabeled data distribution in the form of a reference distribution $\mathcal{U}$ chosen independently of the sensitive data. Given such a reference $\mathcal{U}$, we provide an upper bound on the sample requirement that depends (among other things) on a measure of closeness between $\mathcal{U}$ and the unlabeled data distribution. Our upper bound applies to the non-realizable as well as the realizable case. The second approach is to relax the privacy requirement, by requiring only label-privacy –namely, that the only labels (and not the unlabeled parts of the examples)be considered sensitive information. An upper bound on the sample requirement of learning with label privacy was shown by Chaudhuri et al. (2006); in this work, we show a lower bound.
ER  -

APA


Chaudhuri, K. & Hsu, D.. (2011). Sample Complexity Bounds for Differentially Private Learning. Proceedings of the 24th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 19:155-186 Available from https://proceedings.mlr.press/v19/chaudhuri11a.html.

Related Material

Download PDF