Regularization and the small-ball method II: complexity dependent error rates

We study estimation properties of regularized procedures of the form

$\hat f \in\arg\min_{f\in F}\Big(\frac{1}{N}\sum_{i=1}^N\big(Y_i-f(X_i)\big)^2+\lambda \Psi(f)\Big)$ for a convex class of functions

$F$ , regularization function

$\Psi(\cdot)$ and some well chosen regularization parameter

$\lambda$ , where the given data is an independent sample

$(X_i, Y_i)_{i=1}^N$ . We obtain bounds on the

$L_2$ estimation error rate that depend on the complexity of the true model

$F^*:=\{f\in F: \Psi(f)\leq\Psi(f^*)\}$ , where

$f^*\in\arg\min_{f\in F}\mathbb{E}(Y-f(X))^2$ and the

$(X_i,Y_i)$ 's are independent and distributed as

$(X,Y)$ . Our estimate holds under weak stochastic assumptions -- one of which being a small-ball condition satisfied by

$F$ -- and for rather flexible choices of regularization functions

$\Psi(\cdot)$ . Moreover, the result holds in the learning theory framework: we do not assume any a-priori connection between the output

$Y$ and the input

$X$ . As a proof of concept, we apply our general estimation bound to various choices of

$\Psi$ , for example, the

$\ell_p$ and

$S_p$ -norms (for

$p\geq1$ ), weak-

$\ell_p$ , atomic norms, max- norm and SLOPE. In many cases, the estimation rate almost coincides with the minimax rate in the class

$F^*$ .