David Koes edited paragraph_Descriptor_Calculators_4D_Flexible__.tex  about 8 years ago

Commit id: 7fa376b8c7e02e2aef1222d710ab1036484fadc5

deletions | additions      

       

\paragraph{Descriptor Calculators}  4D Flexible Atom-Pair Kernel (4D FAP) computes a `4D' similarity measure from the molecular graphs of an ensemble of conformations which can be incorporated into QSAR models.  The BlueDesc descriptor calculator is a command-line tool that converts an MDL SD file into ARFF and LIBSVM format using CDK and JOELib2 for machine learning and data mining purposes. It computes 174 descriptors taken from both libraries.  MolSig \cite{Carbonell_2013} computes molecular graph descriptors that include stereochemistry information.  PaDEL-Descriptor \cite{Yap_2010} is a software that calculates molecular descriptors and fingerprints. It computes 1875 descriptors (1444 1D, 2D descriptors and 431 3D descriptors) and 12 types of fingerprints (total 16092 bits). The Chemistry Development Kit is used to calculate the descriptors and fingerprints with additional descriptors and fingerprints like atom type electrotopological state descriptors, Crippen's logP and MR, extended topochemical atom (ETA) descriptors, McGowan volume, molecular linear free energy relation descriptors, ring counts, count of chemical substructures identified by Laggner, and binary fingerprints and count of chemical substructures \cite{Yap_2010}.  Topological maximum cross correlation descriptors (TMACC) \cite{Melville_2007} generates 2D autocorrelation descriptors that are low dimensional and interpretable and appropriate for QSAR modeling.  \paragraph{Model Building}  AZOrange \cite{St_lring_2011}  is a machine learning package that supports QSAR model building in a full work flow from descriptor computation to automated model building, validation and selection. It promotes model accuracy by using several high performance machine learning algorithms for efficient data set specific selection of the statistical approach \cite{St_lring_2011}. approach.  Chemistry aware model builder (camb) \cite{Murrell_2015} is an R package for the generation of quantitative models. Its features include descriptor calculation (including 905 two-dimensional and 14 fingerprint type descriptors for small molecules, 13 whole protein sequence descriptors, and 8 types of amino acid descriptors), model generation, ensemble modeling, and visualization (With ggplot2).  eTOXLab \cite{Carri__2015} provides a portable modeling framework embedded in a self-contained virtual machine for easy deployment.  Open3DALIGN \cite{Tosco_2011}, Open3DGrid, and Open3DQSAR \cite{Tosco_2010} are a suite of related tools that aid in developed 3D QSAR models. Open3DALIGN performs unsupervised rigid-body molecular alignment, Open3DGrid generates molecular interaction fields (MIFs) in a variety of formats, and Open3DQSAR builds predictive models from the MIFs of aligned molecules. Calculations can be visualized in real time in PyMOL.  QSAR-tools is a set of Python scripts that use RDKit to model quantitative structure activity relationships from 2D chemical data. It inputs SMILES file of a training set and computes a set of smarts descriptors unique to that set. By using a fingerprint file, it trains a linear model to predict a numerical quantity of interest and is capable of mapping the model onto the compound to produce an image.  \paragraph{Model Application}  SMARTCyp \cite{Rydberg_2010}  is a QSAR model that predicts the sites of cytochrome P450-mediated metabolism of drug like molecules directly from the 2D structure of a molecule using fragment based energy rules \cite{Rydberg_2010}. rules.  Toxtree \cite{Patlewicz_2008} is a Java GUI application for estimating the ``toxic hazard'' of molecules using a variety of toxicity prediction modules, such as oral toxicity, skin and eye irritation prediction, covalent protein binding and DNA binding, Cytochrome P450-mediated drug metabolism (using SMARTCyp) and more. UG-RNN/AquaSol \cite{Lusci_2013} is an undirected graph recursive neural network that has been trained to predict aqueous solubility from molecular graphs.   \paragraph{Visualization}  CheS-Mapper (chemical space mapper) \cite{G_tlein_2012,G_tlein_2014}.  is a 3D-viewer for small compounds in chemical datasets. It embeds a dataset into 3D space in such a way that the relationship between the structure of chemical compounds, their physico-chemical properties, and biological or toxic effects can be analyzed. In chemical space, compounds that have similar descriptor values places close to each other. CheS-Mapper can calculate variety of descriptors and supports clustering and 3D alignment\cite{G_tlein_2012}\cite{G_tlein_2014}.  DataWarrior \cite{Sander_2015} is a data visualization and analysis tool for chemical data with a rich set of available property calculations, similarity metrics, modeling capabilities, and data set representations.  VIDEAN (visual and interactive descriptor analysis) \cite{Mart_nez_2015} is a visual tool for choosing a subset of descriptors appropriate for predicting a target property with aid of statistical methods. Descriptor selection and model building are performed iteratively at the direction of the user.