## Regularization and the small-ball method II: complexity dependent error rates

*Guillaume LecuĂ©, Shahar Mendelson*; 18(146):1−48, 2017.

### Abstract

We study estimation properties of regularized procedures of the
form $\hat f \in\arg\min_{f\in
F}\Big(\frac{1}{N}\sum_{i=1}^N\big(Y_i-f(X_i)\big)^2+\lambda
\Psi(f)\Big)$ for a convex class of functions $F$,
regularization function $\Psi(\cdot)$ and some well chosen
regularization parameter $\lambda$, where the given data is an
independent sample $(X_i, Y_i)_{i=1}^N$. We obtain bounds on the
$L_2$ estimation error rate that depend on the complexity of the
true model $F^*:=\{f\in F: \Psi(f)\leq\Psi(f^*)\}$, where
$f^*\in\arg\min_{f\in F}\mathbb{E}(Y-f(X))^2$ and the $(X_i,Y_i)$'s are
independent and distributed as $(X,Y)$. Our estimate holds under
weak stochastic assumptions -- one of which being a small-ball
condition satisfied by $F$ -- and for rather flexible choices of
regularization functions $\Psi(\cdot)$. Moreover, the result
holds in the learning theory framework: we do not assume any
a-priori connection between the output $Y$ and the input $X$. As
a proof of concept, we apply our general estimation bound to
various choices of $\Psi$, for example, the $\ell_p$ and
$S_p$-norms (for $p\geq1$), weak-$\ell_p$, atomic norms, max-
norm and SLOPE. In many cases, the estimation rate almost
coincides with the minimax rate in the class $F^*$.

[abs][pdf][bib]