T-sne parameters and accuracy measurements

For spike detection we used a high and low detection threshold of 6.5 and 2 std respectively. We tried out a large range of t-sne parameters (perplexity, learning rate, theta and number of iterations) in order to define the set that gave us good results, but that didn’t take too long to run. For all of the results presented here, the parameters used were perplexity 100, learning rate 200 and theta 0.2. The theta parameter defines the angle of the cone inside which all points are treated as a single average point by the bh algorithm. Smaller values mean that the algorithm averages fewer points together, i.e. only those that are far away from the central point. A value of 0.2 is considered an approximation closer to an exact solution. For the PD128 and the HD1 sets we ran the algorithm for 2000 iterations while for the PD32 and the HD2 for 5000. Perplexities lower than 100 were shown to compromise the results (in as far as separation of clusters defined by visual inspection) while higher numbers (we tried up to 1000) would make no obvious difference other than adding to the run time of the algorithm. We have found that perplexity is a sample number dependent measure, but that for tens to hundreds of thousands of samples (as in all our data sets) the chosen number of 100 offered the best quality vs run time balance. The fact that over a certain value perplexity did not seem to change the quality of the embedding, adds to the idea in the t-sne literature that this is a stable parameter that can vary a lot without substantially influencing the results.
Having labeled data allowed us to measure the quality of the t-sne clustering visualization as a tool for separating single units. Since the t-sne algorithm itself does not cluster (i.e. label) the data, but only offers a 2D embedding, we needed a way to label the spikes according to their position in that embedding. We chose to use the density-based spatial clustering of applications with noise \cite{ester_density-based_1996} (DBSCAN) algorithm since it provides a non-parametric way to label the embedded spikes by clustering together samples that form denser groupings compared to their immediate environment. We found DBSCAN’s approach to clustering matching most closely the human intuition of neural units corresponding to separate groups of spikes in the 2D visualization of the t-sne data.
Having established a method for labeling the t-sne results we then compared the generated labels with the ground truth information from the juxtacellular recordings or the hybrid spike groups. We report here the results of three commonly used measures for such comparisons. The first is Precission (or Confidence or True Positive Accuracy) being the ratio of the true positive samples (i.e. spikes labeled by DBSCAN as part of a unit that also had either a juxtacellular spike correspondence or the correct hybrid label) over all positively labeled samples (all spikes defined by DBSCAN to belong to the specific single unit). The second is Recall (or Sensitivity or True Positive Rate) being the ratio of the true positive samples over all true samples (all spikes with a juxtacellular spike correspondence or a specific hybrid spike label). The third is the F-factor which is defined as the harmonic mean of Precission and Recall (i.e. 2*Precission*Recall/(Precission+Recall)). We also calculated the Receiver Operating Characteristics (ROC) values for each label (either hybrid spike set or juxtacellular corresponding set) as a point on the plot of the True Positive Rate versus the False Positive Rate (see Supplementary Figure 2). The False Positive Rate is defined as the ratio of the false positive samples (spikes defined by the DBSCAN as part of the label but not having a corresponding juxtacellular spike or a hybrid set label) over all the negative samples (all spikes not having the specific juxtacellular or hybrid label).