T-SNE visualization of large-scale neural recordings

George Dimitriadis1,*, Joana P. Neto1, Adam R. Kampff1

1Sainsbury Wellcome Centre, UCL, London, UK
*g.dimitriadis@ucl.ac.uk

Abstract

Electrophysiology is entering the era of ‘Big Data’. Multiple probes, each with hundreds to thousands of individual electrodes, are now capable of simultaneously recording from many brain regions. The major challenge confronting these new technologies is transforming the raw data into physiologically meaningful signals, i.e. single unit spikes. Sorting the spike events of individual neurons from a spatiotemporally dense sampling of the extracellular electric field is a problem that has attracted much attention (Rey 2015, Rossant 2016), but is still far from solved. Current methods still rely on human input and thus become unfeasible as the size of the data sets grow exponentially.
Here we introduce the t-student stochastic neighbor embedding (t-sne) dimensionality reduction method (Van der Maaten 2008) as a visualization tool in the spike sorting process. T-sne embeds the n-dimensional extracellular spikes (n = number of features by which each spike is decomposed) into a low (usually two) dimensional space. We show that such embeddings, even starting from different feature spaces, form obvious clusters of spikes that can be easily visualized and manually delineated with a high degree of precision. We propose that these clusters represent single units and test this assertion by applying our algorithm on labeled data sets both from hybrid (Rossant 2016) and paired juxtacellular/extracellular recordings (Neto 2016). We have released a graphical user interface (gui) written in python as a tool for the manual clustering of the t-sne embedded spikes and as a tool for an informed overview and fast manual curation of results from other clustering algorithms. Furthermore, the generated visualizations offer evidence in favor of the use of probes with higher density and smaller electrodes. They also graphically demonstrate the diverse nature of the sorting problem when spikes are recorded with different methods and arise from regions with different background spiking statistics.

Introduction

It is neuroscience dogma that the brain’s computational mechanics are implemented by the complex dynamics of its spiking neural networks. As a consequence, detailed knowledge of the spiking activity for “as-many-neurons-as-possible” during behavior is seen as essential to understand how the brain receives and transforms information. Electrophysiological methods that record spiking activity extracellularly have been one of the most significant tools for exploring the correlations between behavior and neural activity and there has been a constant drive to record from more neurons, for longer times, from a host of neural regions, in diverse physiological conditions, and from many different species. This trend was recently accelerated by new microfabricated recording probes that extend the standard single electrode and tetrode devices (Recce 1989) with integrated electronics to produce devices with thousands of recording sites (Ruther 2015, Alivisatos 2013).
The new generation of recording tools brings with it the challenge of extracting meaningful physiological signals from the resulting (big) data sets. In the case of extracellular probe recordings, that usually means transforming the voltages measured at the electrode sites into spiking activity of the nearby neurons. The importance of accurate spike sorting stems from a number of ideas on how cell spiking contributes to brain functions. For example, competent sorting is required to test for sparse coding in memory function (Chaudhuri 2016) or to assess the diverse responses of neighboring cells, important in theories of concept (Rey 2015) and place cells (Redish 2001).
The original attempts to spike sort greatly benefited from the development of the tetrode and its ability to simultaneously monitor the spiking signal of nearby neurons from multiple locations (i.e. 4) (Gray 1995, Fee 1996, Wehr 1999). It has since become clear that dense electrode configurations, in which that same neuron is detected by multiple electrodes, generally improve sorting (Lewicki 1998, Buzsáki 2004), and hence the push for an increase in the electrode density of modern probe designs. Today new methodologies have evolved to work with the next generation of multi-electrode probes and to try and address the problem of the exploding size and complexity of the data sets (Rossant 2016, Rey 2015a). However, the basic idea of the spike sorting pipeline remains the same (Fig 1A). The (filtered) data go through a process of spike detection that has traditionally relied on thresholding the raw signal. The multi-unit activity generated is then passed through a dimensionality reduction method that transforms the space-time spike matrices into a smaller set of features. The most commonly used dimensionality reduction techniques are principal component analysis (PCA) (Harris 2000) and wavelet decomposition (Hulata 2002, Quiroga 2004,