# The B5 Superhighway

Let's use Authorea to keep track of B5 materials...

• fits cubes: ready for Glue volume rendering.
• The blue/green is $$C^{18}O$$ (2-1), and red/orange is $$NH_3$$ (1, 1).
• The three slides have been uploaded.
• The clustering is done using a code I wrote following explanation in Alvaro's paper. The friends-of-friends threshold in Alvaro's paper is 3 km s$$^{-1}$$ pc$$^{-1}$$ (~ 1 $$c_s$$/half beam). Using the same threshold, the (extended) B5 would be clustered/grouped into one single component. The clustering in the movies below is done using a threshold of 1 km s$$^{-1}$$ pc$$^{-1}$$ (one third of Alvaro's threshold), in order to cluster the data points into multiple components (which isn't too bad a choice, since Alvaro was using CO (1-0), with a broader line width). In the movies below, the clustering is run on the combined Gaussian fit, where we have one peak from one-component Gaussian fit if the residual of one-component fit is smaller, and two peaks from the two-component fit if the residual of two-component fit is smaller.
• movie0_opaque_linewidth: the movie made from 3D visualization of Gaussian fitted peaks without friends-in-velocity (FIVE) clustering. Brighter/white circles are where the (Gaussian fitted) emission is higher. The size is scaled with the (Gaussian fitted) line width.
• movie0_transparent_linewidth: the same movie, with alpha.
• movie_opaque: the movie made from 3D visualization of Gaussian fitted peaks with friends-in-velocity (FIVE) clustering. The size is NOT scaled with the line width.
• movie_transparent: the same movie, with alpha.
• movie_opaque_linewidth: the movie made from 3D visualization of Gaussian fitted peaks with friends-in-velocity (FIVE) clustering. The size is scaled with the line width.
• movie_transparent_linewidth: the same movie, with alpha.
• The clustering is done using the DBScan (density-based scanning) method in scikit-learn. The DBScan method should perform better than the FoF method(, which is similar to the K-Means). In practice, the mean silhouette coefficient, measuring how the clustering performs (ranging from -1 to 1 for each data point, with -1 meaning that the clustering is not appropriate for that data point, and 1 meaning the clustering is good), shows that the result of the DBScan (mean silhouette score ~ 0.06) is better than the FoF method (mean silhouette score ~ -0.26; the score for the FoF method is calculated on the same standardized dataset used in the DBScan analysis, to be fair). The DBScan method also identifies a number of data points which cannot be clustered (the "noisy samples"). See the scikit-learn clustering page for an overview of various clustering algorithms.
• To implement the DBScan method, the ppv positions of fitted Gaussian peaks (of C$$^{18^{ }}$$O 2-1) are first standardized. No other scaling is applied. The best parameters for setting up DBScan are found by measuring the mean silhouette coefficient, within a reasonable range. DBScan (set up with the best parameters) finds 12 components, compared to 10 components found by FoF.
• movie_DBScan and movie_DBScan_linewidth in the Google drive folder show the result in the original RA-Dec-velocity space. The smaller, black data points indicate those categorized by DBScan as the "noisy samples".