This work presents a new approach for detecting and removing inconsistencies and errors on a local
hourly rainfall dataset at Santa Catarina state in Brazil. The method employs a distance based
unsupervised bivariate anomaly detection (UBAD) algorithm that considers as input data a pair of
concurrent rainfall rates measurements. In order to asses quantitatively the performance of
this technique, a simple Monte Carlo random seeding error procedure is applied to a specific
“target” gauge data. A two-dimensional probability model of introduced/detected error (true/false)
is used to estimate false alarms and correct seed detection conditional probabilities for different
parameters of the UBAD algorithm. Conditional probabilities are used to generate a receiving
operating curve for different detection thresholds levels. As a matter of comparison, the common
unilateral Gamma distribution model is also considered into the analysis. Experimental results show
that in terms of true positive and false positive fractions, both the UBAD method and the Gamma
technique yield a similar characteristic ROC curve. However, the developed algorithm suits better
for data cleaning purposes because it can reach higher overall accuracies of seeded error detection.
Keywords: data cleaning, anomaly detection, hourly rainfall, outliers