## Gaussian Margin Machines

**Koby Crammer, Mehryar Mohri, Fernando Pereira**; JMLR W&CP 5:105-112, 2009.

### Abstract

We introduce Gaussian Margin Machines (GMMs), which maintain a Gaussian distribu- tion over weight vectors for binary classiﬁcation. The learning algorithm for these machines seeks the least informative distribution that will classify the training data correctly with high probability. One formulation can be expressed as a convex constrained optimization problem whose solution can be represented linearly in terms of training instances and their inner and outer products, supporting kernelization. The algorithm has a natural PAC-Bayesian generalization bound. A preliminary evaluation on handwriting recognition data shows that our algorithm improves over SVMs for the same task. methods, we maintain a distribution over alternative weight vectors, rather than committing to a single speciﬁc one. However, these distributions are not derived by Bayes? rule. Instead, they represent our knowledge of the weights given constraints imposed by the training examples. Speciﬁcally, we use a Gaussian distribution over weight vectors with mean and covariance parameters that are learned from the training data. The learning algorithm seeks for a distribu- tion with a small Kullback-Leibler (KL) divergence from a ﬁxed isotropic distribution, such that each training exam- ple is correctly classiﬁed by a strict majority of the weight vectors. Conceptually, this is a large-margin probabilistic principle, instead of the geometric large margin principle in SVMs. The learning problem for GMMs can be expressed as a convex constrained optimization, and its optimal solution