David Koes edited Introduction.tex  over 8 years ago

Commit id: 018b019fc84686c2097dc5211feb34d85540d159

deletions | additions      

       

% introduction and background  \hyphenation{mol-e-cules}  Molecular shape is a fundamental concept in medicinal chemistry \cite{Nicholls2010}, and shape-based virtual screens have successfully identified novel inhibitors.\cite{RushIII2005,McMasters2009,Muchmore2006,Naylor_2009} inhibitors \cite{RushIII2005,McMasters2009,Muchmore2006,Naylor_2009}.  Shape is used both to assess the similarity of a candidate molecule to a set of known actives and to evaluate the complementarity of a molecular shape to the shape of the binding site on the target receptor. Here we describe a novel method for \textit{exactly} specifying shape constraints for querying a large library of molecular shapes. The shape constraints are derived from both the receptor, which describes where a molecule cannot be, and from the shape of a known binder, which describes where the molecule should be.  Our method is unique in relying on a manually positioned ligand \textit{anchor fragment}. This anchor fragment requirement makes our approach particularly applicable to a fragment-based drug discovery workflow\cite{Rees2004,Congreve2008} workflow \cite{Rees2004,Congreve2008}  as shape constraints are a natural way to search for compounds that extend an identified fragment structure while remaining complementary to the receptor. Additionally, the anchor fragment, by defining a fixed coordinate system, enables the indexing of large libraries of molecular shapes. This indexing allows search times to scale sub-linearly with the size of the library, resulting in search performance that is on an interactive time scale.   Shape-based virtual screening typically attempts to identify the most similar molecules in a virtual library to known active molecules or to a pseudo-ligand that is derived from the desired binding site\cite{Ebalunode2008}. site \cite{Ebalunode2008}.  Shape similarity is usually assessed either through alignment methods, which seek to maximize the three dimensional overlap of two shapes, or through feature vector methods, which transform shapes into a low-dimension vector of features that can be efficiently compared. As part of the similarity calculation, molecular shapes may be further annotated with electrostatic or pharmacophore features.\cite{Vainio2009,Cheeseright2006,Thorner1996,Tervo2005,Marin2008,Sastry2011} features \cite{Vainio2009,Cheeseright2006,Thorner1996,Tervo2005,Marin2008,Sastry2011}.  Alignment methods try to find the optimal overlay of two molecules to either maximize the overlapping volume or the correspondence between feature points, such as molecular field extrema\cite{Vainio2009,Cheeseright2006}. extrema \cite{Vainio2009,Cheeseright2006}.  The predominant method of maximizing volume overlap is to represent the molecular shapes as a collection of Gaussians,\cite{Good1993,Grant1996} Gaussians \cite{Good1993,Grant1996},  sample several starting points, and use numerical optimization to find a local maximum.\cite{Hawkins_2007} Instead of Gaussians, maximum \cite{Hawkins_2007}. An alternative method is to represent  the molecular shapecan be represented  by discrete  features and use  point correspondence algorithmscan be used  to generate the alignment. Potential features include pharmacophore features\cite{Sastry2011}, features \cite{Sastry2011},  field points\cite{Thorner1996,Vainio2009,Cheeseright2006}, points \cite{Thorner1996,Vainio2009,Cheeseright2006},  or hyperbolical paraboloid representations of patches of molecular surface\cite{Proschak2008}. surface \cite{Proschak2008}.  A number of performance optimizations to the alignment process have been described\cite{Grant1996,RushIII2005,Sastry2011,Fontaine2007}, described \cite{Grant1996,RushIII2005,Sastry2011,Fontaine2007},  but alignment methods remain computationally expensive relative to feature vector methods, unless shapes can be pre-aligned to a canonical coordinate system, as is done with Volumetric Aligned Molecular Shapes (VAMS).\cite{VAMS} (VAMS) \cite{VAMS}.  Feature vector methods reduce molecular shapes to a simple vector of Boolean or numerical features. Shape similarity is then determined by comparing these vectors using a metric such as Tanimoto or Euclidean distance. The numerical features can be computed using geometric moments\cite{Ballester2007,Schreyer_2012}, moments \cite{Ballester2007,Schreyer_2012},  ray-tracing histograms\cite{Zauhar2003}, histograms \cite{Zauhar2003},  or a small set of reference shapes.\cite{Haigh2005,Putta2002} shapes \cite{Haigh2005,Putta2002}.  Feature vectors enable computationally efficient screening (millions of shape comparisons per a second),\cite{Ballester2007} second) \cite{Ballester2007},  but lack the accuracy and interpret-ability of alignment methods.\cite{Nicholls2010} methods \cite{Nicholls2010}.  Critically, a feature vector similarity does not generate a molecular overlay suitable for visual inspection and analysis. In our approach, fragment oriented molecular shapes (FOMS), we eliminate the computational burden of alignment by requiring the presence of a common anchor fragment. Molecules are trivially aligned by a direct superposition of anchor fragments, and the fragment defines a standard coordinate system for describing the shape of the molecule. Prepositioned molecular fragments are a common component of \textit{de novo} drug design\cite{Schneider2005} design \cite{Schneider2005}  where ligands are `grown' from a prepositioned fragment to fit the binding site. Prepositioned fragments have also been successfully used in structure-based design to identify high-affinity inhibitors for a multitude of targets.\cite{Kick1997,Murray1997,Li1998,Liebeschuetz2002,Koes_2012} targets \cite{Kick1997,Murray1997,Li1998,Liebeschuetz2002,Koes_2012}.  Fragment-based drug discovery workflows\cite{Rees2004,Congreve2008} workflows \cite{Rees2004,Congreve2008}  can provide a physical basis for the selection and positioning of an appropriate anchor fragment. Alternatively, virtual docking methods may be used.\cite{Brenke2009} used \cite{Brenke2009}.  Anchor fragments present a different modality for shape-based screening: the user is required to identify a fragment structure with a meaningful binding mode and the search space is limited to compounds that contain the specified fragment. These requirements enable a new type of search language that supports explicit shape constraints. In essence, a \textit{partial similarity search}\cite{Bronstein2009} search} \cite{Bronstein2009}  can be performed, where instead of optimizing similarity with the entirety of a query shape, the shape constraints specify only part of the shape in detail (e.g., within the binding site) while leaving other parts unspecified (e.g., interactions with solvent). In addition, the use of anchor fragments enable a new mechanism of search. Instead of evaluating the query against every molecule in the virtual screening library, the molecular shapes of the library can be \textit{indexed} so that searches need only evaluate a fraction of the library. This allows large libraries of millions of shapes to be searched on an interactive time scale of a few seconds. Here we describe the retrospective virtual screening performance of FOMS and explore the potential for our explicit shape constraints, when coupled with expert insight, to define highly specific filters for the creation of highly-enriched subsets of large virtual libraries.