Authorea

David Koes edited Introduction.tex over 8 years ago

Commit id: 8a43ce45484dde8d03c898f5d4846553540d073a

deletions | additions

% introduction and background Introductory text Molecular shape is a useful component in the identification of small-molecules for therapeutic intervention,\cite{Nicholls2010} and shape-based virtual screens have successfully identified novel inhibitors.\cite{RushIII2005,McMasters2009,Muchmore2006,Naylor_2009} Shape is used both to assess the similarity of a candidate molecule to a set of known actives and to evaluate the complementarity of a molecular shape to the shape of the binding site on the target receptor. Here we describe a novel method for \textit{exactly} specifying shape constraints for querying a large library of molecular shapes. The shape constraints are derived from both the receptor, which describes where a molecule cannot be, and from the shape of a known binder, which describes where the molecule should be. Our method is unique in relying on a manually positioned ligand \textit{anchor fragment}. This anchor fragment requirement makes our approach particularly applicable to a fragment-based drug discovery workflow\cite{Rees2004,Congreve2008} as shape constraints are a natural way to search for compounds that extend an identified fragment structure while remaining complementary to the receptor. %Additionally, the anchor fragment, by defining a fixed coordinate system, enables the indexing of large libraries of molecular shapes. This indexing allows search times to scale sub-linearly with the size of the library, resulting in search performance that is on an interactive time scale. Shape-based virtual screening typically attempts to identify the most similar molecules in a virtual library to a given set of one or more known active molecules. Alternatively, if the receptor structure is available, a pseudo-ligand can be derived from the desired binding site\cite{Ebalunode2008}. Shape similarity can be determined either through alignment methods, which construct a three dimensional overlay of two shapes, or through feature vector methods, which reduce shapes to a lower-dimension vector of features that are compared numerically. In addition to steric volume, the electrostatic or pharmacophore features of the shape may be taken into account when assessing similarity.\cite{Vainio2009,Cheeseright2006,Thorner1996,Tervo2005,Marin2008,Sastry2011} Alignment methods attempt to either maximize the volume overlap of two molecules or the correspondence between identified feature points, such as molecular field extrema\cite{Vainio2009,Cheeseright2006}. Volume overlap is usually maximized by representing the molecular shape as a collection of Gaussians,\cite{Good1993,Grant1996} sampling several starting points, and using numerical optimization to find a local maximum. Alternative, the molecule may be decomposed into a set of features, such as pharmacophore features\cite{Sastry2011}, field points\cite{Thorner1996,Vainio2009,Cheeseright2006}, or hyperbolical paraboloid representations of patches of molecular surface\cite{Proschak2008}, and various point correspondence algorithms may be used to generate an alignment. Although a number of performance improvements to alignment methods have been described,\cite{Grant1996,RushIII2005,Sastry2011,Fontaine2007} the task remains computationally intensive. An alternative, computationally less demanding approach is to reduce molecular shapes to a simple vector of Boolean or numerical features. For example, a small set of reference shapes may be used to define a Boolean shape-fingerprint\cite{Haigh2005,Putta2002} or translation and rotation invariant properties such as geometric moments\cite{Ballester2007} or ray-tracing histograms\cite{Zauhar2003} maybe used to create a numeric vector. Shape similarity is then computed by comparing these feature vectors with an appropriate metric, such as Euclidean distance. The simplicity of the feature vector representation results in very fast screening (millions of shape comparisons per a second,\cite{Ballester2007} but comes at the loss of accuracy and interpret-ability. These approaches have been shown to correlate poorly with more rigorous shape similarity computations and have been deemed to be too fragile or blunt to be useful for virtual screening.\cite{Nicholls2010} Additionally, theses methods are fundamentally less informative than alignment methods since they do not provide a molecular overlay. In our approach we eliminate the computational burden of alignment by requiring the presence of a common anchor fragment. Molecules are trivially aligned by a direct superposition of anchor fragments and the fragment defines a standard coordinate system for describing the shape of the molecule. Prepositioned molecular fragments are a common component of \textit{de novo} drug design\cite{Schneider2005} where ligands are `grown' from a prepositioned fragment to fit the binding site. Prepositioned fragments have also been successfully used in structure-based design to identify high-affinity inhibitors for a multitude of targets.\cite{Kick1997,Murray1997,Li1998,Liebeschuetz2002} Fragment-based drug discovery workflows\cite{Rees2004,Congreve2008} can provide a physically basis for the selection and positioning of an appropriate anchor fragment. Alternatively, virtual docking methods may be used.\cite{Brenke2009} Anchor fragments present a different modality for shape-based screening: the user is required to identify a fragment structure with a meaningful binding mode and the search space is limited to compounds that contain the specified fragment. These requirements enable a new type of search language that supports explicit shape constraints. In essence, a \textit{partial similarity search}\cite{Bronstein2009} can be performed, where instead of optimizing similarity with the entirety of a query shape, the shape constraints specify only part of the shape in detail (e.g., within the binding site) while leaving other parts unspecified (e.g., interactions with solvent). In addition, the use of anchor fragments enable a new mechanism of search. Instead of evaluating the query against every molecule in the virtual screening library, the molecular shapes of the library can be indexed so that searches need only evaluate a fraction of the library. This allows large libraries of millions of shapes to be searched on an interactive time scale of a few seconds. Here we describe the retrospective virtual screening performance of anchor-oriented shape matching and explore the potential for our explicit shape constraints, when coupled with expert insight, to define highly specific filters for the creation of highly-enriched subsets of large virtual libraries.