Authorea

[section] [theorem] [theorem]Lemma

Gaussain Processes

\label{sec:GaussinaProcesses} In this paper we consider a specific class of regression functions \(\mathcal{GP}\) – Gaussian Processes. Any process \(P\in\mathcal{GP}\) is uniqely defined by its mean \(\mu(\mathbf{x}) = \mathrm{E}\left[f(\mathbf{x})\right]\) and covariance \(\mathrm{Cov}\left(y, y^\prime\right) = k\left(\mathbf{x}, \mathbf{x}^\prime\right) = \mathrm{E}\left[\left(f\left(\mathbf{x}\right) - \mu\left(\mathbf{x}\right)\right) \left(f\left(\mathbf{x}^\prime\right) - \mu\left(\mathbf{x}^\prime\right)\right)\right]\) functions.

If the mean function is set to zero, i.e. \(\mu(\mathbf{x}) = \mathrm{E}\left[f\left(\mathbf{x}\right)\right] = 0\), and covariance function is assumed to be known, aposterior mean value of the Gaussian Process in the test set \(X_*\) has form \cite{Rasmussen} \(\hat{f}(X_*) = K_* K^{-1} Y\), where \(K_* = K(X_*, X) = \left[k(\mathbf{x}_i, \mathbf{x}_j), i = \overline{1, N_*}, j = \overline{1,N}\right]\) and \(K = K(X, X) = \left[k(\mathbf{x}_i, \mathbf{x}_j), i, j = \overline{1, N}\right]\).

It is generally assumed that the data is obsereved with random noise: \( y(\mathbf{x}) = f(\mathbf{x}) + \varepsilon(\mathbf{x})\), where \(\varepsilon(\mathbf{x})\sim\mathcal{N}(0, \tilde{\sigma}^2)\). In that case observations \(y(\mathbf{x})\) are generated by Gaussian Process with zero mean and covariance function \(\mathrm{Cov}\left(y(\mathbf{x}), y(\mathbf{x}^\prime)\right) = k(\mathbf{x}, \mathbf{x}^\prime) + \tilde{\sigma}^2\delta(\mathbf{x}- \mathbf{x}^\prime)\), where \(\delta(\mathbf{x})\) is a Dirac delta funciton.

Thus, aposterior mean funciton of Gaussian Process \(f(\mathbf{x})\) in the points of test set \(X_*\) takes form: \[\hat{f}(X_*) = K_* \left(K + {\sigma}^2 I \right)^{-1} Y, \label{eq:meannoise}\] where \(I\) – identity matrix of size \((N \times N)\).

Note, that noise variance \(\tilde{\sigma}^2\) in (\ref{eq:meannoise}) in fact leads to regularization and more generalization ability of the resulting regression. Wherein the aposteriori covariance function of Gaussian Process in the points of test set takes form: \[\mathrm{V} \left[X_*\right] = K(X_*, X_*) + \tilde{\sigma}^2 I_* - K_* \left(K + \tilde{\sigma}^2 I \right)^{-1} K_*^T, \label{eq:covariancenoise}\] where \(K(X_*, X_*) = \left[k(\mathbf{x}_i, \mathbf{x}_j) \middle| i, j = 1, \dots, N_*\right]\) and \(I_*\) – identity matrix of size \((N_* \times N_*)\).