# The B5 Superhighway

Let's use Authorea to keep track of B5 materials...

• fits cubes: ready for Glue volume rendering.
• The blue/green is $$C^{18}O$$ (2-1), and red/orange is $$NH_3$$ (1, 1).
• The three slides have been uploaded.
• The clustering is done using a code I wrote following explanation in Alvaro's paper. The friends-of-friends threshold in Alvaro's paper is 3 km s$$^{-1}$$ pc$$^{-1}$$ (~ 1 $$c_s$$/half beam). Using the same threshold, the (extended) B5 would be clustered/grouped into one single component. The clustering in the movies below is done using a threshold of 1 km s$$^{-1}$$ pc$$^{-1}$$ (one third of Alvaro's threshold), in order to cluster the data points into multiple components (which isn't too bad a choice, since Alvaro was using CO (1-0), with a broader line width). In the movies below, the clustering is run on the combined Gaussian fit, where we have one peak from one-component Gaussian fit if the residual of one-component fit is smaller, and two peaks from the two-component fit if the residual of two-component fit is smaller.
• movie0_opaque_linewidth: the movie made from 3D visualization of Gaussian fitted peaks without friends-in-velocity (FIVE) clustering. Brighter/white circles are where the (Gaussian fitted) emission is higher. The size is scaled with the (Gaussian fitted) line width.
• movie0_transparent_linewidth: the same movie, with alpha.
• movie_opaque: the movie made from 3D visualization of Gaussian fitted peaks with friends-in-velocity (FIVE) clustering. The size is NOT scaled with the line width.
• movie_transparent: the same movie, with alpha.
• movie_opaque_linewidth: the movie made from 3D visualization of Gaussian fitted peaks with friends-in-velocity (FIVE) clustering. The size is scaled with the line width.
• movie_transparent_linewidth: the same movie, with alpha.
• The clustering is done using the DBScan (density-based scanning) method in scikit-learn. The DBScan method should perform better than the FoF method(, which is similar to the K-Means). In practice, the mean silhouette coefficient, measuring how the clustering performs (ranging from -1 to 1 for each data point, with -1 meaning that the clustering is not appropriate for that data point, and 1 meaning the clustering is good), shows that the result of the DBScan (mean silhouette score ~ 0.06) is better than the FoF method (mean silhouette score ~ -0.26; the score for the FoF method is calculated on the same standardized dataset used in the DBScan analysis, to be fair). The DBScan method also identifies a number of data points which cannot be clustered (the "noisy samples"). See the scikit-learn clustering page for an overview of various clustering algorithms.
• To implement the DBScan method, the ppv positions of fitted Gaussian peaks (of C$$^{18^{ }}$$O 2-1) are first standardized. No other scaling is applied. The best parameters for setting up DBScan are found by measuring the mean silhouette coefficient, within a reasonable range. DBScan (set up with the best parameters) finds 12 components, compared to 10 components found by FoF.
• movie_DBScan and movie_DBScan_linewidth in the Google drive folder show the result in the original RA-Dec-velocity space. The smaller, black data points indicate those categorized by DBScan as the "noisy samples".

One other way to check the performance of clustering algorithms is to look at the clustering in the parameter space composed of parameters that were not used in the clustering processes. Interestingly, DBScan (more) successfully identifies at least one distinct population (around width ~ 0.5; green in the DBScan plot) in the width-height space, which can be recognized by human eyes. There are less points in the DBScan plot, because the "noisy samples" (data points that cannot be clustered; black, smaller points in the movies) are removed.
Replace this text with your caption