Descending the Alternate Sparse Coding Objective

\label{descend} The original SAILnet paper \cite{Zylberberg_2011} maintains the reconstruction part of the sparse coding objective but replaces the sparse prior with a set of two constraints: homeostasis for the individual neuron’s firing rates and decorrelation for the pairwise statistics: \[\label{eq:1} E(X, a; \Phi, W, \theta) = \frac{1}{2}\sum_i(X_i-\sum_j\Phi_{ij}a_j)^2 + \sum_i\theta_i(a_i-p) + \sum_{ij}W_{ij}(a_ia_j-p^2)\]

The original sparse coding reconstruction term in \ref{eq:1} will lead to both non-local learning rules for \(\Phi\) and a non-local inference circuit. So, it is approximated by an objective which will lead to Oja’s learning rule: \[\label{local} E(X, a; \Phi, W, \theta) = \frac{1}{2}\sum_{ij}(X_i-\Phi_{ij}a_j)^2 + \sum_i\theta_i(a_i-p) + \sum_{ij}W_{ij}(a_ia_j-p^2).\] Note that in \ref{local}, the sum on \(j\) has moved outside of the squaring operation in the first term. As shown in \cite{Zylberberg_2011}, gradient descent with respect to \(\Phi, W\), and \(\theta\) give local learning rules for the parameters of the model.

The negative gradient of the local objective function \ref{local} is \[-\frac{\partial E(a| X; \Phi, W, \theta)}{\partial a_i} = \sum_jX_j\Phi_{ji}-\sum_j \Phi_{ji}^2a_i - \theta_i - 2\sum_{j\ne i}W_{ji}a_j.\] The first term is the same linear filtering term. The second is the leakiness term with an additional scaling by the length squared of the dictionary element. The dictionary is commonly normalized to have length 1 to prevent it from growing without bound, although Oja’s rule does not require this. Empirically, the mean norm will be on the order of length 1, but can vary by a small integer factor and will have non-zero variance. It is an interesting prediction that the leakiness of the membrane of a neuron should scale with the overall strength of its synapses. \(-\theta\) would be converted into a spike-threshold in a LIF version of this analog equation. Finally, the last term \(W\) accounts for within layer inhibition.

Without a leaky integrate-and-fire circuit (LIF) circuit, the analog optimization problem is solved by

\[\frac{da_i}{dt} = -\frac{\partial E(a|X; \Phi, W, \theta)}{\partial a_i}.\]

In \cite{Zylberberg_2011}, a leaky integrate-and-fire circuit (LIF) was instantiated which had three terms in the integrator circuit \[\begin{split} \tau \frac{du_i}{dt} &= -u_i + \sum_j X_j\Phi_{ji} - \sum_{j\ne i}W_{ij}y_j\\ y_i &= 1 \text{ if } u_i > \theta_i \text{ else } 0 \end{split}\] which are leakiness, linear filtering, and inhibition. \(y_j\) is the instantaneous spiking variable for neuron \(i\). This circuit may not always descend the objective function, as we can show.

Given the analog optimization problem in \ref{descend}, there are a number of different ways to instantiate a LIF circuit to approximate inference. The negative gradient can be interpreted as the driving force in the circuit \[\frac{du_i}{dt} = -\frac{\partial E(a|X; \Phi, W, \theta)}{\partial u_i} = -\frac{\partial E(a|X; \Phi, W, \theta)}{\partial a_i} \frac{\partial a_i}{\partial u_i}\]