Authorea

Pavel Erofeev edited GP.tex over 9 years ago

Commit id: cab5bdc77e490b9124bcea9d09f7527dd72f6ff1

deletions | additions

\subsection{Gaussain Processes} \label{sec:GaussinaProcesses} In this paper we consider a specific class of regression functions $\mathcal{GP}$ -- Gaussian Processes. Any process $P\in\mathcal{GP}$ is uniqely defined by its mean $\mu(\mathbf{x}) = \mathrm{E}\left[f(\mathbf{x})\right]$ and covariance \[\mathrm{Cov}\left(y, $\mathrm{Cov}\left(y, y^\prime\right) = k\left(\mathbf{x}, \mathbf{x}^\prime\right) = \mathrm{E}\left[\left(f\left(\mathbf{x}\right) - \mu\left(\mathbf{x}\right)\right) \left(f\left(\mathbf{x}^\prime\right) - \mu\left(\mathbf{x}^\prime\right)\right)\right]\] \mu\left(\mathbf{x}^\prime\right)\right)\right]$ functions. If the mean function is set to zero, i.e. $\mu(\mathbf{x}) = \mathrm{E}\left[f\left(\mathbf{x}\right)\right] = 0$, and covariance function is assumed to be known, aposterior mean value of the Gaussian Process in the test set $X_*$ has form \cite{Rasmussen} $\hat{f}(X_*) = K_* K^{-1} Y$, where $K_* = K(X_*, X) = \left[k(\mathbf{x}_i, \mathbf{x}_j), i = \overline{1, N_*}, j = \overline{1,N}\right]$ and $K = K(X, X) = \left[k(\mathbf{x}_i, \mathbf{x}_j), i, j = \overline{1, N}\right]$.

% Дисперсии гауссовского процесса в точках контрольной выборки могут быть использованы как оценки ожидаемой ошибки аппроксимации в этих точках. % Заметим, что для этого нет необходимости вычислять по формуле (\ref{covarianceNoise}) всю матрицу $\VV \bigl[X_*\bigr]$, а достаточно вычислить только элементы ее главной диагонали, которые и являются искомыми дисперсиями. Moreover, knowing mena and covariance funcitons one can have an aposteriori estimate of mean and variance of the Gaussian Process gradient in the points of test set. \begin{lemma} Given two line segments whose lengths are $a$ and $b$ respectively there is a real number $r$ such that $b=ra$. \end{lemma} Indeed, if \[ \mathbf{g}(\mathbf{x}_0) = \frac{\partial f(\mathbf{x})}{\partial \mathbf{x}} \Big |_{\mathbf{x}=\mathbf{x}_0}, \] then $ \mathrm{Law}\left(g(\mathbf{x}_0) \middle| (X, Y)\right) = \mathcal{N}(J^T \bigl(K + \tilde{\sigma}^2 I\bigr)^{-1} Y, \, B - J^T \bigl(K + \tilde{\sigma}^2I)^{-1} J), $ where \[ J^T = \Big [ \frac{\partial k (\mathbf{x}_0 - \mathbf{x}_1)}{\partial \mathbf{x}_0} , ... , \frac{\partial k (\mathbf{x}_0 - \mathbf{x}_n)}{\partial \mathbf{x}_0} \Big ], \] \[ B = \begin{bmatrix} cov(g_1(\mathbf{x}_0),g_1(\mathbf{x}_0)) & .&.&. & cov(g_1(\mathbf{x}_0),g_m(\mathbf{x}_0)) \\ . & . & & & .\\ . & & . & & .\\ . & & & . & .\\ cov(g_m(\mathbf{x}_0),g_1(\mathbf{x}_0)) & .&.&. & cov(g_m(\mathbf{x}_0),g_m(\mathbf{x}_0)) \\ \end{bmatrix}, \] \[ cov(g_i, g_j) = \frac{\partial^2 k (\mathbf{x}_0, \mathbf{x}_0)}{\partial x^i \partial x^j}, \] $g_i$ --- $i$th component of gradient vector $\mathbf{g}$.