loading page

Automated identification of characteristic droplet size distributions in stratocumulus clouds utilizing a data clustering algorithm
  • +1
  • Nithin Allwayin,
  • Michael Larsen,
  • Alexander Shaw,
  • Raymond Shaw
Nithin Allwayin
Michigan Technological University
Author Profile
Michael Larsen
Michigan Technological University,College of Charleston
Author Profile
Alexander Shaw
Brigham Young University
Author Profile
Raymond Shaw
Michigan Technological University

Corresponding Author:rashaw@mtu.edu

Author Profile


Droplet-level interactions in clouds are often parameterized by a modified gamma fitted to a “global” droplet size distribution. Do “local” droplet size distributions of relevance to microphysical processes look like these average distributions? This paper describes an algorithm to search and classify characteristic size distributions within a cloud. The approach combines hypothesis testing, specifically the Kolmogorov-Smirnov (KS) test, and a widely-used machine-learning algorithm for identifying clusters of samples with similar properties: Density-based spatial clustering of applications (DBSCAN). The two-sample KS test does not presume any specific distribution, is parameter free, and avoids biases from binning. Importantly, the number of clusters is not an input parameter of the DBSCAN algorithm, but is independently determined in an unsupervised fashion. As implemented, it works on an abstract space from the KS test results, and hence spatial correlation is not required for a cluster. The method is explored using data obtained from Holographic Detector for Clouds (HOLODEC) deployed during the Aerosol and Cloud Experiments in the Eastern North Atlantic (ACE-ENA) field campaign. The algorithm identifies evidence of the existence of clusters of nearly-identical local size distributions. It is found that cloud segments have as few as one and as many as seven characteristic size distributions. To validate the algorithm’s robustness, it is tested on a synthetic dataset and successfully identifies the predefined distributions at plausible noise levels. The algorithm is general and is expected to be useful in other applications, such as remote sensing of cloud and rain properties.