David Koes edited Introduction.tex  over 8 years ago

Commit id: 80de5769bdb021b6bd3ca3e66ecfe571e6a89748

deletions | additions      

       

Feature vector methods reduce molecular shapes to a simple vector of Boolean or numerical features. Shape similarity is then determined by comparing these vectors using a metric such as Tanimoto or Euclidean distance. The numerical features can be computed using geometric moments\cite{Ballester2007,Schreyer_2012}, ray-tracing histograms\cite{Zauhar2003}, or a small set of reference shapes.\cite{Haigh2005,Putta2002}   Feature vectors enable computationally efficient screening (millions of shape comparisons per a second),\cite{Ballester2007} but lack the accuracy and interpret-ability of alignment methods.\cite{Nicholls2010} Critically, a feature vector similarity does not generate a molecular overlay suitable for visual inspection and analysis.  In our approach, fragment oriented molecular shapes (FOMS), we eliminate the computational burden of alignment by requiring the presence of a common anchor fragment. Molecules are trivially aligned by a direct superposition of anchor fragments, and the fragment defines a standard coordinate system for describing the shape of the molecule. Prepositioned molecular fragments are a common component of \textit{de novo} drug design\cite{Schneider2005} where ligands are `grown' from a prepositioned fragment to fit the binding site. Prepositioned fragments have also been successfully used in structure-based design to identify high-affinity inhibitors for a multitude of targets.\cite{Kick1997,Murray1997,Li1998,Liebeschuetz2002} targets.\cite{Kick1997,Murray1997,Li1998,Liebeschuetz2002,Koes_2012}  Fragment-based drug discovery workflows\cite{Rees2004,Congreve2008} can provide a physical basis for the selection and positioning of an appropriate anchor fragment. Alternatively, virtual docking methods may be used.\cite{Brenke2009}  Anchor fragments present a different modality for shape-based screening: the user is required to identify a fragment structure with a meaningful binding mode and the search space is limited to compounds that contain the specified fragment. These requirements enable a new type of search language that supports explicit shape constraints. In essence, a \textit{partial similarity search}\cite{Bronstein2009} can be performed, where instead of optimizing similarity with the entirety of a query shape, the shape constraints specify only part of the shape in detail (e.g., within the binding site) while leaving other parts unspecified (e.g., interactions with solvent). In addition, the use of anchor fragments enable a new mechanism of search. Instead of evaluating the query against every molecule in the virtual screening library, the molecular shapes of the library can be indexed \textit{indexed}  so that searches need only evaluate a fraction of the library. This allows large libraries of millions of shapes to be searched on an interactive time scale of a few seconds. Here we describe the retrospective virtual screening performance of FOMS and explore the potential for our explicit shape constraints, when coupled with expert insight, to define highly specific filters for the creation of highly-enriched subsets of large virtual libraries.