loading page

Parsers, Data Structures, and Algorithms for Macromolecular Analysis Toolkit (MAT): Design and Implementation
  • +3
  • Gazal Kalyan,
  • Vivek Junghare,
  • S. John S,
  • Pralay Mitra,
  • Anupam Chattopadhyay,
  • Saugata Hazra
Gazal Kalyan
University of North Dakota
Author Profile
Vivek Junghare
Indian Institute of Technology Roorkee
Author Profile
S. John S
Indian Institute of Technology Roorkee
Author Profile
Pralay Mitra
Indian Institute of Technology Kharagpur
Author Profile
Anupam Chattopadhyay
Nanyang Technological University
Author Profile
Saugata Hazra
Indian Institute of Technology Roorkee

Corresponding Author:[email protected]

Author Profile

Abstract

An accurate and efficient biological tool for various utilities is required for the structural information of biological macromolecules stored in .pdb, .cif and lately .mmtf files. Here, we present Macromolecular Analysis Toolkit (MAT) that can parse all of these files; and build a hierarchical data structure from the given input. This original program is written in C++ to ensure flexibility and performance. The novelty of the program lies in the addition of new structure-based biological algorithms and applications. This package also stands out from other similar libraries by being fast and accurate. We also provide a detailed comparison of available parsers on the whole PDB database. The same data structure is extended to accommodate information from the mmCIF and MMTF parsers. Tokenization of the data allows the extraction of information from disordered text, making it compatible for accurate identification of the entities present in the .pdb file. The parser of MAT is designed in such a way that it allows quick extraction and efficient loading of the core data structure. Additionally, we add a new approach by creating a few derived data structures, namely kD-Tree, Octree, and graphs. For certain applications that need spatial coordinate calculations, these can be subsequently constructed.