loading page

A Bivariate Anomaly Detection Strategy for Data Cleaning of Hourly Rainfall Rates at Santa Catarina State (preprint)
  • Renzo A. Viloche Morales
Renzo A. Viloche Morales

Corresponding Author:[email protected]

Author Profile

Abstract

This work presents a new approach for detecting and removing inconsistencies and errors on a local hourly rainfall dataset at Santa Catarina state in Brazil. The method employs a distance based unsupervised bivariate anomaly detection (UBAD) algorithm that considers as input data a pair of concurrent rainfall rates measurements. In order to asses quantitatively the performance of this technique, a simple Monte Carlo random seeding error procedure is applied to a specific “target” gauge data. A two-dimensional probability model of introduced/detected error (true/false) is used to estimate false alarms and correct seed detection conditional probabilities for different parameters of the UBAD algorithm. Conditional probabilities are used to generate a receiving operating curve for different detection thresholds levels. As a matter of comparison, the common unilateral Gamma distribution model is also considered into the analysis. Experimental results show that in terms of true positive and false positive fractions, both the UBAD method and the Gamma technique yield a similar characteristic ROC curve. However, the developed algorithm suits better for data cleaning purposes because it can reach higher overall accuracies of seeded error detection.

Keywords: data cleaning, anomaly detection, hourly rainfall, outliers