INTRODUCTION

During the previous decades, significant progress has been made on extracting features for similarity pursuit and object recognition from functionality-rich information, for example, sound, picture, video, and other sensor datasets. Since feature-rich data objects are normally spoken to as high-dimensional feature vectors, similarity search is generally actualized as K-Nearest Neighbor (KNN) or Approximate Nearest Neighbors (ANN) search in high-dimensional feature-vector space. The similarity search should have the following properties: Accurate, Time efficient, Space efficient, High-dimensional. In addition, the construction of the index data structure should be quick and it should deal with various sequences of insertions and deletions conveniently. A good search mechanism in an efficient content-based search system for feature-rich data. To start with, it ought to convey list items effectively on large datasets without utilizing much CPU and memory assets. For instance, it ought to have the capacity to pursuit millions of information articles and data objects in seconds. Second, it should to have the capacity to accomplish high search quality by using advanced element extraction techniques. For example, it ought to be able to handle multi-feature vector representations and EMD similarity measure used in a RBIR system. Third, it ought to have the capacity to search data with multiple modalities effectively. For example, when searching the continuous archived information recorded from numerous medical gadgets in an intensive care unit, a user ought to have the capacity to express and search patterns of various data sources. Fourth, it should be able to integrate with the keyword-based search engine. For example, client have the capacity to perform content-based similarity search together with attribute-based search such as time range or annotation-based search. The Content-Aware Similarity Search (CASS) has four current research topics. First is Sketch Construction Techniques. It focuses on how the image was drawn or constructed and its dimension. Its goal is to produce a practical algorithm to construct sketches to substantially reduce the dimension and sizes of feature vectors while achieving high-grade similarity searches. Second is the Efficient filtering and indexing method. This topic is difficult for the developers because the method of indexing and filtering large feature-rich datasets requires similarity match and similarity search and indexing data structures for exact match do not apply. The goal is to find out novel data structures and algorithms to filter and index for similarity search of large amounts of datasets. Third is the Similarity search of multiple data types. The goal is to have a deeper understanding of similarity search of various data types which includes audio, images, documents and many more. Lastly, is the Toolkit for similarity search. This toolkit will be different among the most search toolkits we have right now. The goal is to develop a toolkit that can be used to construct search engines for various data types by plugging in specific data segmentations, feature extractions and distance calculation modules.