[section] [section]

[section] [section] [section]

We are now ready to give our central definition, which corresponds to a notion of “expected value” for distributional estimation problems.

\label{def:op}

Fix \(\Gamma\) a pair of growth spaces of rank 2 and \(\mathcal{E}\) an error space of rank 2. Consider \((\mu,f)\) a distributional estimation problem and \(P: {{\{ 0, 1 \}^*}}\xrightarrow{\Gamma} {\mathbb{Q}}\) with bounded range. \(P\) is called an \(\mathcal{E}(\Gamma)\)-optimal predictor for \((\mu,f)\) when for any \(Q: {{\{ 0, 1 \}^*}}\xrightarrow{\Gamma} {\mathbb{Q}}\) there is \(\varepsilon \in \mathcal{E}\) s.t.

\[\label{eqn:op} \operatorname{E}_{\mu^k \times U^{\operatorname{r}_P(k,j)}}[(P^{kj} - f)^2] \leq \operatorname{E}_{\mu^k \times U^{\operatorname{r}_Q(k,j)}}[(Q^{kj} - f)^2] + \varepsilon(k,j)\]

Distributional decision problems are the special case when the range of \(f\) is \({\{0,1\}}\). In this special case, the outputs of an optimal predictors can be thought of as probabilities1.


  1. That said, \(P^{kj}(x,y)=1\) doesn’t imply \(f(x) = 1\) and \(P^{kj}(x,y)=0\) doesn’t imply \(f(x)=0\). We can try to fix this using a logarithmic error function instead of the squared norm, however this creates other difficulties and is outside the scope of the present work.