Fragment Oriented Molecular Shapes
Molecular shape is an important concept in drug design and virtual screening. Shape similarity typically uses either alignment methods, which dynamically optimize molecular poses with respect to the query molecular shape, or feature vector methods, which are computationally less demanding but less accurate. The computational cost of alignment can be reduced by pre-aligning shapes, as is done with the Volumetric-Aligned Molecular Shapes (VAMS) method. Here we introduce and evaluate Fragment Oriented Molecular Shapes (FOMS), where shapes are aligned based on molecular fragments. FOMS enables the use of shape constraints, a novel method for precisely specifying molecular shape queries that provides the ability to perform partial shape matching and supports search algorithms that function on an interactive time scale. When evaluated using the challenging Maximum Unbiased Validation dataset, shape constraints were able to extract significantly enriched subsets of compounds for the majority of targets, and FOMS matched or exceeded the performance of both VAMS and an optimizing alignment method of shape similarity search.
Molecular shape is a fundamental concept in medicinal chemistry (Nicholls 2010), and shape-based virtual screens have successfully identified novel inhibitors (Rush III 2005, McMasters 2009, Muchmore 2006, Naylor 2009). Shape is used both to assess the similarity of a candidate molecule to a set of known actives and to evaluate the complementarity of a molecular shape to the shape of the binding site on the target receptor. Here we describe a novel method for exactly specifying shape constraints for querying a large library of molecular shapes. The shape constraints are derived from both the receptor, which describes where a molecule cannot be, and from the shape of a known binder, which describes where the molecule should be. Our method is unique in relying on a manually positioned ligand anchor fragment. This anchor fragment requirement makes our approach particularly applicable to a fragment-based drug discovery workflow (Rees 2004, Congreve 2008) as shape constraints are a natural way to search for compounds that extend an identified fragment structure while remaining complementary to the receptor. Additionally, the anchor fragment, by defining a fixed coordinate system, enables the indexing of large libraries of molecular shapes. This indexing allows search times to scale sub-linearly with the size of the library, resulting in search performance that is on an interactive time scale.
Shape-based virtual screening typically attempts to identify the most similar molecules in a virtual library to known active molecules or to a pseudo-ligand that is derived from the desired binding site (Ebalunode 2008). Shape similarity is usually assessed either through alignment methods, which seek to maximize the three dimensional overlap of two shapes, or through feature vector methods, which transform shapes into a low-dimension vector of features that can be efficiently compared. As part of the similarity calculation, molecular shapes may be further annotated with electrostatic or pharmacophore features (Vainio 2009, Cheeseright 2006, Thorner 1996, Tervo 2005, Marí-n 2008, Sastry 2011).
Alignment methods try to find the optimal overlay of two molecules to either maximize the overlapping volume or the correspondence between feature points, such as molecular field extrema (Vainio 2009, Cheeseright 2006). The predominant method of maximizing volume overlap is to represent the molecular shapes as a collection of Gaussians (Good 1993, Grant 1996), sample several starting points, and use numerical optimization to find a local maximum (Hawkins 2007). An alternative method is to represent the molecular shape by discrete features and use point correspondence algorithms to generate the alignment. Potential features include pharmacophore features (Sastry 2011), field points (Thorner 1996, Vainio 2009, Cheeseright 2006), or hyperbolical paraboloid representations of patches of molecular surface (Proschak 2008). A number of performance optimizations to the alignment process have been described (Grant 1996, Rush III 2005, Sastry 2011, Fontaine 2007), but alignment methods remain computationally expensive relative to feature vector methods, unless shapes can be pre-aligned to a canonical coordinate system, as is done with Volumetric Aligned Molecular Shapes (VAMS) (Koes 2014).
Feature vector methods reduce molecular shapes to a simple vector of Boolean or numerical features. Shape similarity is then determined by comparing these vectors using a metric such as Tanimoto or Euclidean distance. The numerical features can be computed using geometric moments (Ballester 2007, Schreyer 2012), ray-tracing histograms (Zauhar 2003), or a small set of reference shapes (Haigh 2005, Putta 2002). Feature vectors enable computationally efficient screening (millions of shape comparisons per a second) (Ballester 2007), but lack the accuracy and interpret-ability of alignment methods (Nicholls 2010). Critically, a feature vector similarity does not generate a molecular overlay suitable for visual inspection and analysis.
In our approach, fragment oriented molecular shapes (FOMS), we eliminate the computational burden of alignment by requiring the presence of a common anchor fragment. Molecules are trivially aligned by a direct superposition of anchor fragments, and the fragment defines a standard coordinate system for describing the shape of the molecule. Prepositioned molecular fragments are a common component of de novo drug design (Schneider 2005) where ligands are ‘grown’ from a prepositioned fragment to fit the binding site. Prepositioned fragments have also been successfully used in structure-based design to identify high-affinity inhibitors for a multitude of targets (Kick 1997, Murray 1997, Li 1998, Liebeschuetz 2002). Notably, our AnchorQuery web service (Koes 2012) provides interactive pharmacophore virtual screening of billions of custom compounds for protein-protein interaction (PPI) inhibitors by pre-aligning compounds to amino acid side-chain motifs corresponding to anchor residues (Rajamani 2004,