David Koes edited subsection_Virtual_Screening_Evaluation_In__.tex  over 8 years ago

Commit id: 8b5bb9f3e6dffcec5e5484bc23ca3ec4031df481

deletions | additions      

       

\subsection*{Virtual Screening Evaluation}  In order to investigate the utility of the FOMS approach we consider both the ability of shape constraint filters to generate enriched subsets and the quality of shape similarity rankings generated using fragment aligned molecules. We compare to VAMS, which aligns all molecules to a canonical reference system based on their moments of inertia, and rdShape, the shape alignment module of rdkit \cite{rdkit} which dynamically aligns shapes to maximize their overlap using Open3DALIGN \cite{Tosco_2011}. The input conformers for rdShape alignment were the same aligned poses used with VAMS, and the computed similarity score is the Tanimoto coefficient.Results are reported using receiver operating characteristic curves which plot the false positive rate (FPR) with respect to the true positive rate (TPR) as the classification sensitivity threshold is changed for a ranking. The area under the curve (AUC) is reported.  For each target, conformers of the active and decoy compounds were generated using rdkit\cite{rdkit}. A maximum of 100 conformers with a minimum RMSD difference of 0.7{\AA} and an energy window cutoff of 10 were generated for each compound. For each fragment considered, the corresponding conformers were extracted into fragment-specific subsets. The subsets were then preprocessed to create VAMS and FOMS search databases. The VAMS database stores a single pose for each conformation aligned along its moments of inertia. For FOMS, if a compound contains multiple instances of the anchor fragment or the fragment contains symmetries, multiple poses per a conformation are stored to account for the multiple fragments/symmetries.  For each target, a single reference structure was identified by searching BindingDB,\cite{Liu_2007} PDBbind,\cite{Wang_2005} and Binding MOAD\cite{Hu_2005} for the complex with the best binding affinity. The structures were visually inspected to identify buried anchor fragments that made key contacts with the protein, as shown in Figure~\ref{targets} and Table~\ref{fragtable}. All shape queries were constructed using the reference complexes.  We evaluated both shape constraints and shape similarity using the single reference complex. For the shape constraint search we evaluated multiple maximum shape constraints by shrinking the receptor shape by 0, 0.5, 1.0, 1.5, and 2.0 {\AA}ngstroms. For the minimum shape constraints we used interaction points with minimal radius (one voxel). We evaluated every possible query of interaction points. That is, if $n$ interaction points were identified, we evaluated $2^n$ minimum shapes. Selecting subsets of interaction points reduces the dependency of the query on the full shape of the reference molecule and more closely mimics the use case where the shape constraints are manually constructed by an expert to select only the most important interactions. As each shape constraint query is a filter, it does not return a total ranking of database compounds, so we report the true positive and false positive rate of the selected subset. We calculate a p-value for the returned subset (relative to the null hypothesis of random selection) using a hypergeometric test. Reported p-values are corrected for multiple comparisons using the Bonferroni correction (the calculated p-value is multiplied bynumber of comparisons).  Shape similarity results are reported using the area under the curve (AUC) of the receiver operating characteristic (ROC) curve. This metric is relatively insensitive to  the number of compound and has well-developed statistical properties \cite{Jain_2007}. comparisons).  Shape similarity results are reported using the area under the curve (AUC) of the receiver operating characteristic (ROC) curve. This metric is not biased by the number of compounds and has well-developed statistical properties \cite{Jain_2007}. The AUC is equivalent to the probability that a randomly selected active and randomly selected inactive compound will be correctly ranked by a method. An AUC of 0.5 corresponds to random chance while a perfect predictor exhibits an AUC 1.0.