Additional comparisons were performed. To measure the number of extra classes that a method creates for a given simulation, a metric called 'Additional Classes' was defined as the total number of classes in the solution minus the number of clusters labeled as the best match of neurons in the simulation (to avoid negative values if an algorithm merges two neurons). A large value of this quantity could be an important limitation for long-term recordings because the final number of classes to analyze, curate and compare will grow with the duration of the recording. On the other hand, to measure the general agreement of the sorting with the ground truth, we used the Adjusted Rand Index \cite{Steinley_2004}. Figure \ref{129092} shows the results, where Spikes_link_WC always obtained a better agreement (higher Adjusted Rand Index) with the ground truth (p<7e-5) and a significantly lower number of additional classes (p<4e-3). These results support the idea that the design choices of the Spikes_link framework can improve spike sorting algorithms during long-term recordings with possible nonstationarities(e.g. under drifting conditions).