Discussion

The MUV dataset, with its focus on eliminating analogue bias, is particularly resistant to single-query shape-based virtual screens \cite{Tiikkainen_2009}. This is reflected in our overall results, shown in Figure \ref{aucs}, where only two targets (Rho and PKA) achieve AUCs where the 95% confidence interval does not overlap with 0.5 (random performance). The remaining targets likely lack meaningful whole-molecule shape complementary between the query ligand and the active compounds of the benchmark. One exception may be HIV-rt, where there is clear early enrichment which indicates that a subset of the actives may be compatible with the query molecule. FOMS dramatically outperformed other methods for the Rho and PKA targets due to correct positioning of a fragment with key, conserved interactions. For comparison, Figure \ref{aucs} also shows the performance of a 2D fingerprint, the OpenBabel FP2 \cite{O_Boyle_2011} path-based fingerprint. 2D information is more successful for three targets (ER\(\beta\), FXIa, HSP90), and has comparable or worse performance for the remaining targets, illustrating the orthogonality of 2D and 3D approaches.

Our assumption in the design of this study was that, since the evaluated methods utilize rigid conformers, it would be necessary to generate a large sampling of conformers to ensure the biologically relevant conformer was screened and was correctly registered as a hit. Surprisingly, when we tested this assumption with the Rho Kinase target (results were similar for PKA), we found that the virtual screening performance for all three shape methods was relatively insensitive to the number of conformers used (Figure \ref{confs}). Interestingly, this is not because the top ranked conformer is the most representative conformer. Instead, as demonstrated in Figure \ref{confs}, reducing the number of conformers does result in a reduction in similarity scores for active compounds. However, a compensating reduction in scores is observed in the decoy set as the number of conformers sampled is decreased, resulting in similar virtual screening results. This suggests that sampling a large number of conformations will be beneficial in terms of producing meaningful poses with high similarity to the query ligand, a key advantage of shape-based methods, but, counter-intuitively, may not be strictly necessary for virtual screening purposes.

FOMS essentially provides a rapid means of template docking \cite{Ruiz_Carmona_2014,abagyan2015icm,Koes_2012} using shape-based scoring. The disadvantage of fragment-oriented approaches is they are critically dependent on the choice of fragment and its proper positioning in defining the query. Provided these requirements can be met, there are several advantages to shape-based fragment alignment search. By enforcing the fragment alignment, key interaction are guaranteed to be conserved. Previous studies have demonstrated the importance of adding pharmacophoric properties (or ‘color’) to shape similarity \cite{Hawkins_2007}. Fragment alignment introduces a hard bias toward matching a key portion of the query molecule without introducing any additional computation, as required by more general methods. In fact, as we have shown, pre-alignment substantially reduces the computational overhead.

Prealignment, whether to fragments (FOMS) or canonical internal coordinates (VAMS) is orders of magnitude faster than methods that dynamically optimize the alignment. This holds true even if the cost to create the search database is taken into account. The time to create the databases scales with the number of molecular shapes (about 10 shapes a second on our system) and compares favorable with RDKit search (2 molecules a second). As the common case is for a fragment oriented database to be re-used to for queries by multiple users investigating multiple targets, in practice the cost of database creation gets amortized into insignificance.

A major advantage of fragment alignment is that it enables the use of shape constraints. Shape constraint search generally tracked or improved upon the performance of FOMS similarity ranking (e.g. Figure \ref{cathg}). As shown in Table \ref{pvaltable}, shape constraints were able to generate statistically significant (\(p < 0.01\)) enriched subsets for six of the ten targets. Unlike whole-molecule shape similarity, shape constraints can select for a subshape of the query ligand (i.e., perform partial shape matching) and specifically filter out potential clashes with the receptor. Furthermore, shape constraints support an indexing search algorithm that scales sub-linearly. In our evaluation, shape constraint searches were orders of magnitude faster than even VAMS similarity search and generally took well under a second.

Shape constraints provide a novel and unique method for specifying molecular shape queries. Although we have investigated automatically generated shape constraints, our assumption is that intelligently designed constraints created by human experts will substantially outperform these interaction point based constraints. The speed of shape constraint searches enables interactive analysis and refinement of shape queries where the expert user sculpts their desired molecule guided by the results of searches of full-sized databases. We anticipate that fragment-oriented shape constraints will prove a useful tool in both fragment-based drug discovery workflows, which are naturally centered around a fragment, and lead optimization workflows, where the core scaffold of the lead compound can serve as a fragment.