Content Aware Similarity Search (CASS)

Abstract

A Content Aware Similarity Search (CASS) is designed to examine the contents of entire data packets as well as header and payload information. The network device comprises of a physical interface for transforming analog network signal into bit streams and vice versa. The bit stream coming from the physical interface is forwarded to a traffic flow scanning processor that may be, but is not essentially, broken up into a header processor and a payload analyzer. The header processor examines the header information from each data packet, which is used to find out routing information and session identification. The payload analyzer scans the data packet’s payload and pairs the payload against a database of known strings. The payload analyzer is able to scan across packet boundaries and to scan for strings of variable and arbitrary length. Once the payload has been scanned, the network device can function on the data packet based on the results of the payload analyzer. The scanned data packets and the associated conclusions undergoes a quality of service processor which enhances the data packets if necessary and performs traffic management and traffic shaping on the flow of data packets based on contents of the data packets.

Keywords

Data packets, Analog network signal, Routing information, Session identification, Payload analyzer, Database

INTRODUCTION

During the previous decades, significant progress has been made on extracting features for similarity pursuit and object recognition from functionality-rich information, for example, sound, picture, video, and other sensor datasets. Since feature-rich data objects are normally spoken to as high-dimensional feature vectors, similarity search is generally actualized as K-Nearest Neighbor (KNN) or Approximate Nearest Neighbors (ANN) search in high-dimensional feature-vector space. The similarity search should have the following properties: Accurate, Time efficient, Space efficient, High-dimensional. In addition, the construction of the index data structure should be quick and it should deal with various sequences of insertions and deletions conveniently. A good search mechanism in an efficient content-based search system for feature-rich data. To start with, it ought to convey list items effectively on large datasets without utilizing much CPU and memory ass