Authorea

David Koes edited Introduction.tex over 8 years ago

Commit id: a1757361c3e3cfa7765968c003cf69d09a8f6be6

deletions | additions

Shape-based virtual screening typically attempts to identify the most similar molecules in a virtual library to a known active molecules or to a pseudo-ligand that is derived from the desired binding site\cite{Ebalunode2008}. Shape similarity is usually assessed either through alignment methods, which seek to maximize the three dimensional overlap of two shapes, or through feature vector methods, which transform shapes into a low-dimension vector of features that can be efficiently compared. As part of the similarity calculation, molecular shapes may be further annotated with electrostatic or pharmacophore features.\cite{Vainio2009,Cheeseright2006,Thorner1996,Tervo2005,Marin2008,Sastry2011} Alignment methods try to find the optimal overlay of two molecules to either maximize the overlapping volume or the correspondence between feature points, such as molecular field extrema\cite{Vainio2009,Cheeseright2006}. The predominant method of maximizing volume overlap is to represent the molecular shapes as a collection of Gaussians,\cite{Good1993,Grant1996} sample several starting points, and use numerical optimization to find a local maximum. Alternative, maximum.\cite{Hawkins_2007} Instead of Gaussians, the molecule may molecular shape can be decomposed into a set of features, such as represented by features and point correspondence algorithms can be used to generate the alignment. Potential features include pharmacophore features\cite{Sastry2011}, field points\cite{Thorner1996,Vainio2009,Cheeseright2006}, or hyperbolical paraboloid representations of patches of molecular surface\cite{Proschak2008}, and various point correspondence algorithms may be used to generate an alignment. Although a surface\cite{Proschak2008}. A number of performance improvements optimizations to the alignment methods process have been described,\cite{Grant1996,RushIII2005,Sastry2011,Fontaine2007} the task remains described\cite{Grant1996,RushIII2005,Sastry2011,Fontaine2007}, but alignment methods remain computationally intensive. expensive relative to feature vector methods. An alternative, computationally less demanding approach is to Feature vector methods reduce molecular shapes to a simple vector of Boolean or numerical features.For example, a small set of reference shapes may be used to define a Boolean shape-fingerprint\cite{Haigh2005,Putta2002} or translation and rotation invariant properties such as geometric moments\cite{Ballester2007} or ray-tracing histograms\cite{Zauhar2003} maybe used to create a numeric vector. Shape similarity is then computed determined by comparing thesefeature vectors with an appropriate metric, using a metric such as Tanimoto or Euclidean distance. The simplicity numerical features can be computed using geometric moments\cite{Ballester2007,Schreyer_2012}, ray-tracing histograms\cite{Zauhar2003}, or a small set of the feature vector representation results in very fast reference shapes.\cite{Haigh2005,Putta2002} Feature vectors enable computationally efficient screening (millions of shape comparisons per a second,\cite{Ballester2007} second),\cite{Ballester2007} but comes at lack theloss of accuracy and interpret-ability. These approaches have been shown to correlate poorly with more rigorous shape similarity computations and have been deemed to be too fragile or blunt to be useful for virtual screening.\cite{Nicholls2010} Additionally, theses methods are fundamentally less informative than interpret-ability of alignment methods since they do methods.\cite{Nicholls2010} Critically, a feature vector similarity does not provide generate a molecular overlay. overlay suitable for visual inspection and analysis. In our approach we eliminate the computational burden of alignment by requiring the presence of a common anchor fragment. Molecules are trivially aligned by a direct superposition of anchor fragments and the fragment defines a standard coordinate system for describing the shape of the molecule. Prepositioned molecular fragments are a common component of \textit{de novo} drug design\cite{Schneider2005} where ligands are `grown' from a prepositioned fragment to fit the binding site. Prepositioned fragments have also been successfully used in structure-based design to identify high-affinity inhibitors for a multitude of targets.\cite{Kick1997,Murray1997,Li1998,Liebeschuetz2002} Fragment-based drug discovery workflows\cite{Rees2004,Congreve2008} can provide a physically physical basis for the selection and positioning of an appropriate anchor fragment. Alternatively, virtual docking methods may be used.\cite{Brenke2009} Anchor fragments present a different modality for shape-based screening: the user is required to identify a fragment structure with a meaningful binding mode and the search space is limited to compounds that contain the specified fragment. These requirements enable a new type of search language that supports explicit shape constraints. In essence, a \textit{partial similarity search}\cite{Bronstein2009} can be performed, where instead of optimizing similarity with the entirety of a query shape, the shape constraints specify only part of the shape in detail (e.g., within the binding site) while leaving other parts unspecified (e.g., interactions with solvent). In addition, the use of anchor fragments enable a new mechanism of search. Instead of evaluating the query against every molecule in the virtual screening library, the molecular shapes of the library can be indexed so that searches need only evaluate a fraction of the library. This allows large libraries of millions of shapes to be searched on an interactive time scale of a few seconds.