ROUGH DRAFT authorea.com/91183

# A Solution to the Hyperparameter Dependency of Hilbert Maps

Abstract

Kernel-approximation methods in tandem with efficient online classifiers have proven to be a highly effective solution to the occupancy map problem. In this paper we seek to expand upon the work done in this area. Our contributions are twofold. We demonstrate that a Bayesian logistic regression classifier fails to perform better than the more spartan point-estimator/subgradient descent method. Contrastingly, we show that Bayesian optimisation over the hyperparameters of the model is an incredibly powerful and useful tool for this application.

## Introduction

The occupancy map problem learns a probabilistic model of some space. We map every point within to an estimation of the probability that it is occupied. For a primer on the problem, see (Ramos 2015).

Recently, a novel approach for solving this problem in an effective and on-line manner was proposed. The method, called hilbert-maps, transform a low-dimensional feature vector to a more dense space. This grants a simple linear classifier strong expressibility and power. The mapping process is further optimised through the use of kernel-approximator methods.

Such a model has been demonstrated to perform very well in terms of accuracy/time tradeoff (Ramos 2015). It does however suffer from a dependence on a number of hyperparameters. Solving this problem would allow for better results and a wider application.

To this end, in Section 2 we explore a model-shift from frequentist to Bayesian logistic regression formulation. Our goal is to firstly determine whether we can get more accurate models. Also of consideration is whether the paradigmatic shift in our characterisation of uncertainty results in less hyper-parameter dependency.

In Section 3 of this paper, our focus is on Bayesian parameter optimisation. We attempt to tackle the hyper-parameter problem more directly through the use of intelligent automatic selection algorithms.

# Bayesian Logistic Regression Formulation

Define the logistic sigmoid function $$\sigma(\alpha) = \frac{1}{1 + \exp(-\alpha)}$$. A logistic regression models a binary-response variable with the following density function:

\begin{align} p(y_{i}=1\mid\mathbf{x_{i},w}) & =\sigma(\mathbf{w^{T}x_{i}}) \\ p(y_{i}=0\mid\mathbf{x_{i},w}) & =1-\sigma(\mathbf{w^{T}x_{i}}) \\ & =1-\frac{1}{1+\exp(-\mathbf{w^{T}x_{i}}}) \\ & =\frac{1}{1+\exp(\mathbf{w^{T}x_{i}})} \\ & =\sigma(\mathbf{-w^{T}x_{i}})\\ \end{align}

Where $$\mathbf{w^T}$$ is a vector of weights, and $$\mathbf{x_i}$$ is our original data point with the addition of an identity element to allow for a intercept term. That is, if our original data was of the form $$\tilde{\mathbf{x}}_{i} = (x_{i1}, x_{i2}, \dots, x_{id-1})^T \in \mathbb{R}^{d-1}$$, then our transformed data point would be represented as $$(1, x_{i1}, x_{i2}, \dots, x_{id-1})^T := \mathbf{x_i} \in \mathbb{R}^{d}$$. The addition of this dimension is used widely, for example in simple linear regression.

Our learning task is then to find some estimation of this weight vector $$\mathbf{w^T}$$, given our dataset $$\mathcal{D} = (\mathbf{Y_n, X_n} )$$.

The traditional approach is to find a point-estimator through a maximum-likelihood approach. This is equivalent to formulating an objective function of the likelihood of our estimator given the data, and optimising through any appropriate method. Alternatively and equivalently, we could optimise over the log-likelihood. Either way, it turns out that the objective function is strictly convex (Rennie 2005). This means we can get a reasonable point-estimate quite efficiently, for examp