Our CNN model for these neurons consists of two layers (cf. Fig1b). The first layer is convolutional with a single \(17\times17\) kernel (zero-padded with strides set to 1). This single spatial filter has to be learned to give an optimal description of all neurons. The second layer takes the inner product with N location masks to split the convolution output into spatially localized responses for each neuron. There is no non-linear function in the network. We regularize the second layer with L-1 norm to enforce sparse location masks and add batch-normalization […] after the first layer to prevent the kernel weights from going-up in turn.
We conducted a grid search over the hyperparameters and found optimal performance with (learning rate, initialization scales ... [mention here?]).