Authorea

David Koes edited section_Results_Consistent_with_previous__.tex over 8 years ago

Commit id: 91292dd918d35e430b9d907fb4f14b8ac2ec390f

deletions | additions

\section*{Results} Consistent with previous studies \cite{Tiikkainen_2009}, we find the MUV dataset to be a challenging target for shape-based screening with few targets demonstrating AUCs far from random performance. Overall FOMS either matched or exceeded the virtual screening performance of VAMS and RDKit while retaining most of the benefits of pre-aligned molecules. Specifically, it is orders of magnitude faster than the optimizing alignment of RDKit suggesting, at a minimum, that FOMS is a viable method for rapidly pre-screening large libraries. In general, the Pareto frontier of the shape constraint queries delineates virtual screen performance equivalent or better to a full shape similarity search while performing queries orders of magnitude faster. For each target, we plot the ROC curves for FOMS, VAMS, and RDKit similarity search as well as the results for every interaction point shape constraint query. Shape constraint results along the Pareto frontier are highlighted as solid circles and the most statistically significant result is annotated with its Bonferroni-corrected p-value. $p$-value. We also plot the total time required for the FOMS, VAMS, and RDKit similarity search and provide a box plot of the distribution of times for the shape constraint searches. All time plots are plotted on a logarithmic scale due to the orders of magnitude difference between the methods. Shape constraint search time varies considerably depending on the number of hits retrieved. Most queries are highly selective, return few or no results, and take less than a hundredth of a second. We do not include the time needed to generate the shape constraints in the query time since the receptor-based constraints can be generated once and caches for future searches. We plot times for only those queries that generate results. For these queries the median time, as shown in the box plots, remains below a tenth of a second. A few queries are non-selective queries and return subsets of compounds that are comparable in size to the full database. These queries approach or exceed the running times of a full similarity search. The statistically most significant shape constraint query, annotated with a p-value in the ROC plot, has its running time plotted as a circle. These informative queries typically exceed the median time, but still take less than a second.