this is for holding javascript data
Xavier Holt edited Bayesian_Optimisation_over_the_Hyperparameters__.md
over 8 years ago
Commit id: 5a15bdccd54cf66c7efa0768f6c7840ce0afb876
deletions | additions
diff --git a/Bayesian_Optimisation_over_the_Hyperparameters__.md b/Bayesian_Optimisation_over_the_Hyperparameters__.md
index 5d25973..f37fca1 100644
--- a/Bayesian_Optimisation_over_the_Hyperparameters__.md
+++ b/Bayesian_Optimisation_over_the_Hyperparameters__.md
...
The Bayesian optimisation paradigm is particularly computationally efficient for the task at hand. It allows us to optimise arbitrary functions in relatively few iterations. The downside is that there is more costly inter-iteration work performed. Given that the occupancy-grid problem typically involves datasets with a massive number of samples, we find that in our case the intra-iteration work dominates our runtime and as such it's a good trade-off to make.
We also consider that the
most common alternative is to perform a grid-search. This is a
model method that might be reasonable for
efficient models
that are efficient to train or
have a small number of
hyperparameters, but in hyperparameters. In the absence of these
qualities qualities, however, it performs quite
terribly. poorly. Given our large number of hyper-parameters and the range of values they can take, grid-search's exponential run-time is particularly concerning.
Finally, our model in theory allows for a high degree of parallelisation. We have to do a little more work compared to grid-search, as we don't want to repeat experiments, but in essence we have a large number of independent experiments and as such we can do work across multiple cores.
## Choosing the Next Hyperparameter
We adopt the formulation found in
\cite{snoek2012practical} \cite{snoek2012practical}. Our hyperparameters are assumed a-priori multivariate-normal.
This results in a procedure that can find the minimum of difficult non-convex functions
with relatively few evaluations, at the cost of performing more computation to determine the
next point to try. When evaluations of f(x) are expensive to perform — as is the case when it
requires training a machine learning algorithm — then it is easy to justify some extra computation
to make better decisions
This prior and these data induce a posterior over functions;
the acquisition function, which we denote by a : X → R
+, determines what point in X should be