Content-Based Search of Large Image Archives at the PDS Imaging Node Motivation of Content-Based Search
The Planetary Data System (PDS) maintains archives of data collected by NASA missions that explore our solar system. The PDS Cartography and Imaging Sciences Node (Imaging Node) provides access to millions of images of planets, moons, and other bodies. Given the large and continually growing volume of data, there is a need for tools that enable users to quickly search for images of interest. Each image archived at the PDS Imaging Node is described by a rich set of searchable metadata properties, such as the time it was collected and the instrument used. However, users often wish to search on the content of the image to find those images most relevant to their scientific investigation or individual curiosity.
To enable the content-based search of the large image archives, we utilized machine learning techniques to create convolution neural network (CNN) classification models. The initial CNN classification results for rover missions (i.e., Mars Science Laboratory and Mars Exploration Rover) and orbiter missions (i.e., Mars Reconnaissance Orbiter, Cassini, and Galileo) were deployed at the PDS Image Atlas (https://pds-imaging.jpl.nasa.gov/search
) in 2017. With the content-based search capability, users of the PDS Image Atlas can search using a list of pre-defined classes and quickly find relevant images. For example, users can search “Impact ejecta” and find the images containing impact ejecta from the archive of the Mars Reconnaissance Orbiter mission.
All of the CNN classification models were trained using the transfer learning approach, in which we adapted a CNN model pretrained on Earth images to classify planetary images. Over the past several years, we employed the following three techniques to improve the efficiency of collecting labeled data sets, the accuracy of the models, and the interpretability of the classification results:
· First, we used the marginal-probability based active learning (MP-AL) algorithm to improve the efficiency of collecting labeled data sets.
· Second, we used the classifier chain and ensemble approaches to improve the accuracy of the classification results.
· Third, we incorporated the prototypical part network (ProtoPNet) architecture to improve the interpretability of the classification results.