ERM Principle Consistency

Denote as $R_{emp}(a_l) \triangleq \inf_{a \in \Lambda}R_{emp}(a_l)=\inf_{a \in \Lambda}\frac{1}{l}\sum_{i=1}^{l}Q(z_i,a)$

Definition. ERM principle is consistent for the set of functions $$Q(z,a)$$, $$a \in \Lambda$$ and the probability distribution $$F(z)$$ if the following two sequences converge in probability: $R(a_l) \xrightarrow[l \rightarrow \infty]{P} \inf_{a \in \Lambda}R(a)$ $R_{emp}(a_l) \xrightarrow[l \rightarrow \infty]{P} \inf_{a \in \Lambda}R(a)$

Definition. ERM principle is strictly consistent for the set of functions $$Q(z,a)$$, $$a \in \Lambda$$ and the probability distribution $$F(z)$$ if for any non empty subset $$\Lambda(c)$$, $$c \in (-\infty, +\infty)$$ such that $\Lambda(c) \triangleq \bigg \{a: \int Q(z,a) dF(z) \geq c \bigg \}$ the convergence $\label{eq:strict_consistency} \inf_{a \in \Lambda(c)}R_{emp}(a) \xrightarrow[l \rightarrow \infty]{P} \inf_{a \in \Lambda(c)}R(a)$ is valid.

Lemma 1.1. If ERM principle is strictly consistent then the following convergence in probability holds: $\label{eq:cons_lemma} R(a_l) \xrightarrow[l \rightarrow \infty]{P} \inf_{a \in \Lambda}R(a)$ Proof. Denote $\label{eq:T} T \triangleq \inf_{a \in \Lambda}R(a)$ Then, due to (\ref{eq:strict_consistency}) $\label{eq:cons_prob_1} P\bigg\{ \inf_{a \in \Lambda}R_{emp}(a) - \inf_{a \in \Lambda}R(a) \geq \frac{\epsilon}{2} \bigg\} \xrightarrow[l \rightarrow \infty]{} 0$ and $\label{eq:cons_prob_0} P\bigg\{\inf_{a \in \Lambda(T+\epsilon)}R_{emp}(a) - \inf_{a \in \Lambda}R(a) \geq \frac{\epsilon}{2} \bigg\} \xrightarrow[l \rightarrow \infty]{} 1$ are valid for any $$\epsilon > 0$$. (\ref{eq:cons_prob_1}) and (\ref{eq:cons_prob_0}) imply that $$a_l \notin \Lambda(T+\epsilon)$$. Thus, from the definition of subsets $$\Lambda(c)$$ the following inequalities $T \leq \int Q(z,a_l) df(z) \leq T+\epsilon$ hold.

Convergence in (\ref{eq:strict_consistency}) always implies convergence in (\ref{eq:cons_lemma}) but not vice versa.

Key Theorem of Statistical Learning Theory

The relationship $P\bigg\{\sup_{a\in \Lambda} \bigg |\int Q(z,a)dF(z) - \frac{1}{l}\sum_{i=1}^{l}Q(z_i,a) \bigg | > \epsilon \bigg \} \xrightarrow[l \rightarrow \infty]{} 0,$ for any $$\epsilon > 0$$, is called uniform convergence of means to their mathematical expectations over a given set of functions. Similarly, the relationship $P\bigg\{\sup_{a\in \Lambda} \bigg (\int Q(z,a)dF(z) - \frac{1}{l}\sum_{i=1}^{l}Q(z_i,a) \bigg ) > \epsilon \bigg \} \xrightarrow[l \rightarrow \infty]{} 0,$ for any $$\epsilon > 0$$, is called uniform one-sided convergence of means to their mathematical expectations over a given set of functions.

Theorem 1.1 (Key Theorem). Let there exist the constants $$\beta$$ and $$B$$ such that for all functions $$Q(z,a)$$, $$a \in \Lambda$$ and for a given distribution $$F(z)$$ the inequalities $\beta \leq \int Q(z,a)dF(z) \leq B , \:\: a \in \Lambda$ are valid.

Then the following two statements are equivalent:

1. For the given distribution function $$F(z)$$, the ERM principle is strictly consistent on the set of functions $$Q(z,a)$$, $$a \in \Lambda$$.

2. For the given distribution function $$f(z)$$, the uniform one-sided convergence of the means to their mathematical expectations takes place over the set of functions $$Q(z,a)$$, $$a \in \Lambda$$.

Proof. Let the ERM principle be strictly consistent. Then, for any non empty subset $$\Lambda(c)$$ the following convergence in probability $\label{eq:cons_in_prob} P \bigg \{ \inf_{a \in \Lambda(c)} \int Q(z,a)dF(z) - \inf_{a \in \Lambda(c)} \frac{1}{l} \sum_{i=1}^{l} Q(z_i,a) > \frac{\epsilon}{2} \bigg \} \xrightarrow[l \rightarrow \infty]{} 0$ is valid for any $$\epsilon > 0$$. Then, we split the range of the functions $$Q(z,a)$$, $$a \in \Lambda$$ using a finite sequence of numbers $$\beta_1, \beta_2, \cdots \beta_n$$ such that $\label{eq:cons_betas} |\beta_{i+1} - \beta_i | < \frac{\epsilon}{2}, \:\:\: \beta_1=\beta, \beta_n = B.$ We denote as $$T_k$$ the event $\label{eq:cons_eventTk} \inf_{a \in \Lambda(\beta_k)} \int Q(z,a)dF(z) - \inf_{a \in \Lambda(\beta_k)} \frac{1}{l} \sum_{i=1}^{l} Q(z_i,a) > \frac{\epsilon}{2}$ and as $$T$$ the event $$\cup_{k=1}^nT_k$$. Then by virtue of (\ref{eq:cons_in_prob}) $P(T) = P(\cup_{k=1}^n T_k) \leq \sum_{k=1}^n P(T_k) \xrightarrow[l \rightarrow \infty]{} 0.$

In order to show that strict consistency of ERM principle implies uniform convergence of means to their mathematical expectations, the event $A \triangleq \sup_{a \in \Lambda} \bigg(\int Q(z,a)dF(z) - \frac{1}{l}\sum_{i=1}^{l}Q(z_i,a) \bigg) > \epsilon$ must converge in probability to zero. Consider that event $$A$$ takes place, then there will be $$a^* \in \Lambda$$ such that $\label{eq:cons_AtakesPlace} \int Q(z,a^*)dF(z) - \frac{1}{l}\sum_{i=1}^{l}Q(z_i,a^*) > \epsilon.$ Due to (\ref{eq:cons_betas}) we can find $$\beta_k$$ such that $$a^* \in \Lambda(\beta_k)$$ and $\int Q(z,a^*)dF(z) - \beta_k < \frac{\epsilon}{2}.$ For the chosen set $$\Lambda(\beta_k)$$ the inequality $\label{eq:cons_eventA} \int Q(z,a^*)dF(z) - \inf_{a \in \Lambda(\beta_k)} \int Q(z,a)dF(z) < \frac{\epsilon}{2}$ is valid.

Using (\ref{eq:cons_AtakesPlace}) and (\ref{eq:cons_eventA}) the inequalities $\label{eq:cons_AleqTk} \begin{split} \inf_{a \in \Lambda(\beta_k)} \int Q(z,a)dF(z) + \frac{\epsilon}{2} & > \int Q(z,a^*)dF(z) \\ > \frac{1}{l}\sum_{i=1}^{l}Q(z_i. a^*) + \epsilon & > \inf_{a \in \Lambda(\beta_k)} \frac{1}{l} \sum_{i=1}^{l} Q(z_i, a) + \epsilon \end{split}$ hold and implie that event $$T_k$$ also takes place. This means that $A \subseteq T_k \subseteq T \Longrightarrow P(A) \leq P(T_k) \leq P(T) \xrightarrow[l \rightarrow \infty]{} 0 .$ The first part of the theorem is proven.

For proving the second part of the theorem, we have to show that whenever uniform one-sided convergence of mean to their mathematical expectations takes place, the ERM principle is strictly consistent, that is, that for any $$\epsilon$$ the convergence $P\bigg \{ \bigg | \inf_{a \in \Lambda(c)} \int Q(z,a)dF(z) - \inf_{a \in \Lambda(c)} \frac{1}{l} \sum_{i=1}^{l}Q(z_i,a)\bigg | > \epsilon \bigg \} \xrightarrow[l \rightarrow \infty]{} 0$ holds. The event $A \triangleq \bigg | \inf_{a \in \Lambda(c)} \int Q(z,a)dF(z) - \inf_{a \in \Lambda(c)} \frac{1}{l} \sum_{i=1}^{l}Q(z_i,a)\bigg | > \epsilon$ is the union of events $A_1 \triangleq \inf_{a \in \Lambda(c)} \int Q(z,a)dF(z) + \epsilon < \inf_{a \in \Lambda(c)} \frac{1}{l} \sum_{i=1}^{l}Q(z_i,a)$ and $A_2 \triangleq \inf_{a \in \Lambda(c)} \int Q(z,a)dF(z) - \epsilon > \inf_{a \in \Lambda(c)} \frac{1}{l} \sum_{i=1}^{l}Q(z_i,a).$

Our goal is to show that the event $$A$$ converges in probability to zero. Towards this direction we have to bound the probabilities of events $$A_1$$ and $$A_2$$. Consider that event $$A_1$$ takes place. We fix $$a^*$$ such that $\int Q(z,a^*)dF(z) + \frac{\epsilon}{2} < \inf_{a \in \Lambda(c)} \int Q(z,a)dF(z) + \epsilon.$ Then the following inequalities $\begin{split} \int Q(z,a^*)dF(z) + \frac{\epsilon}{2} & < \inf_{a \in \Lambda(c)} \int Q(z,a)dF(z) + \epsilon \\ < \inf_{a \in \Lambda(c)} \frac{1}{l}\sum_{i=1}^{l}Q(z_1, a) & < \frac{1}{l}\sum_{i=1}^{l}Q(z_1, a^*) \end{split}$ hold. Therefore, $A_1 \subseteq \bigg \{ z: \frac{1}{l}\sum_{i=1}^{l}Q(z_1, a^*) - \int Q(z,a^*)dF(z) > \frac{\epsilon}{2} \bigg \}$ and $P(A_1) \leq P \bigg \{ \frac{1}{l}\sum_{i=1}^{l}Q(z_1, a^*) - \int Q(z,a^*)dF(z) > \frac{\epsilon}{2} \bigg \} \xrightarrow[l \rightarrow \infty]{} 0.$ The RHS convergence is valid due to the Law Of Large Numbers ($$a^*$$ is fixed). For bounding the probability of $$A_1$$ event we did not consider that uniform one-sided convergence takes place. We just used the law of large numbers.

Now consider that event $$A_2$$ takes place. We can find $$a^{**}$$ such that the inequality $\frac{1}{l} \sum_{i=1}^{l}Q(z, a^{**}) < \inf_{a \in \Lambda(c)} \frac{1}{l} \sum_{i=1}^{l}Q(z_i, a) + \frac{\epsilon}{2}$ is valid. Then, the following inequalities $\begin{split} \frac{1}{l} \sum_{i=1}^{l}Q(z, a^{**}) + \frac{\epsilon}{2} & < \inf_{a \in \Lambda(c)} \frac{1}{l} \sum_{i=1}^{l} Q(z_i, a) +\epsilon \\ < \inf_{a \in \Lambda(c)} \int Q(a,z) dF(z) & < \int Q(z, a^{**})dF(z) \begin{split}$ hold. Therefore, $\begin{split} P(A_2) & \leq P\bigg\{ \int Q(z, a^{**})dF(z) - \frac{1}{l} \sum_{i=1}^{l}Q(z, a^{**}) > \frac{\epsilon}{2} \bigg\} \\ & leq P\bigg\{sup_{a \in \Lambda} \bigg ( \int Q(z, a^{**})dF(z) - \frac{1}{l} \sum_{i=1}^{l}Q(z, a^{**}) \bigg ) > \frac{\epsilon}{2} \bigg\} \xrightarrow[l \rightarrow \infty]{} 0 \end{split}$ due to uniform one-sided convergence of means to their mathematical expectations. Because, $A = A_1 \cup A_2 \Longrightarrow P(A) \leq P(A_1) + P(A_2)$ we have that $P(A) \xrightarrow[l \rightarrow \infty]{} 0.$ The theorem is proven.