Abstract
The bag-of-words (BoW) model is one of the most popular representation
methods for image classification. However, the lack of spatial
information, the intra-class diversity, and the inter-class similarity
among scene categories impair its performance in the remote-sensing
domain. To alleviate these issues, this paper proposes to explore the
spatial dependencies between different image regions and introduces
patch-based discriminative learning (PBDL) for remote-sensing scene
classification. Particularly, the proposed method employs multi-level
feature learning based on small, medium, and large neighborhood regions
to enhance the discriminative power of image representation. To achieve
this, image patches are selected through a fixed-size sliding window and
sampling redundancy, a novel concept, is developed to minimize the
redundant features while sustaining the relevant features for the model.
Apart from multi-level learning, we explicitly impose image pyramids to
magnify the visual information of the scene images and optimize their
position and scale parameters locally. Motivated by this, a local
descriptor is exploited to extract multi-level and multi-scale features
that we represent in terms of codewords histogram by performing k-means
clustering. Finally, a simple fusion strategy is proposed to balance the
contribution of individual features, and the fused features are
incorporated into a Bidirectional Long Short-Term Memory (BiLSTM)
network for classification. Experimental results on NWPU-RESISC45, AID,
UC-Merced, and WHU-RS datasets demonstrate that the proposed approach
not only surpasses the conventional bag-of-words approaches but also
yields significantly higher classification performance than the existing
state-of-the-art deep learning methods used nowadays.