Solving Data Imbalance in Landslide Susceptibility Zonation

Sharad Gupta; Dericks Shukla

doi:10.1002/essoar.10501437.1

loading page

Solving Data Imbalance in Landslide Susceptibility Zonation

Sharad Gupta,
Dericks Shukla

Abstract

Landslides cause billions of dollars in property damage and thousands of deaths every year worldwide. India has more than 15% of its land area prone to landslides, hence mapping of these areas for the presence of landslides is of utmost importance. Landslide susceptibility zonation maps give approximate information about the occurrence of landslides. There are various factors responsible for slope instability. In this work, 11 causative factors have been considered such as Aspect, Elevation, Geology, Distance from thrusts, Distance from streams, Plan curvature, Profile curvature, Slope, Stream power index, Tangential curvature, Topographic wetness index. Machine learning methods such as artificial neural network, support vector machine require a large amount of training data; however, the number of landslide occurrences are limited in a study area. The limited number of landslides leads to a small number of positive class pixels in the training data. On contrary, the number of non-landslide pixels (negative class pixels) are huge in numbers. This under-represented data and severe class distribution skew create a data imbalance for learning algorithms and sub-optimal models, which are biased towards the majority class (non-landslide pixels) and have low performance on the minority class (landslide pixels). Generally, the data is imbalanced when the class ratio is of the order of 100:1, 1000:1 and 10000:1 (i.e., one-class points are 100, 1000 or 10000 times more than that of another class points). In our work, class ratio is more than 300:1 (i.e. for each one landslide pixel, we have more than 300 non-landslide pixels). Thus, we can clearly say that our data is imbalanced. There are two major data balancing techniques, which are oversampling of a minority class and under-sampling of majority class. The minority oversampling cannot be applied, as it will create false landslide pixels. We have performed under-sampling of non-landslide pixels using various techniques. We will discuss landslide susceptibility zonation with and without using data imbalance technique and show major improvements in accuracy over imbalanced learning.