Authorea

David Koes edited subsection_Virtual_Screening_Evaluation_In__.tex over 8 years ago

Commit id: 9fdc93e5deb2eb616be852c8610dbff02853e69e

deletions | additions

For each target, conformers of the active and decoy compounds were generated using rdkit\cite{rdkit}. A maximum of 100 conformers with a minimum RMSD difference of 0.7{\AA} and an energy window cutoff of 10 were generated for each compound. For each fragment considered, the corresponding conformers were extracted into fragment-specific subsets. The subsets were then preprocessed to create VAMS and FOMS search databases. The VAMS database stores a single pose for each conformation aligned along its moments of inertia. For FOMS, if a compound contains multiple instances of the anchor fragment or the fragment contains symmetries, multiple poses per a conformation are stored to account for the multiple fragments/symmetries. For each target, a single reference structure was identified by searching BindingDB,\cite{Liu_2007} PDBbind,\cite{Wang_2005} and Binding MOAD\cite{Hu_2005} for the complex with the best binding affinity. The structures were visually inspected to identify buried anchor fragments that made key contacts with the protein, as shown in Figure~\ref{targets} and Table~\ref{fragtable}. All shape queries were constructed using the reference complexes. We evaluated both shape constraints and shape similarity using the single reference complex. For the shape constraint search we evaluated multiple maximum shape constraints by shrinking the receptor shape by 0, 0.5, 1.0, 1.5, and 2.0 {\AA}ngstroms. For the minimum shape constraints we used interaction points with minimal radius (one voxel). We evaluated every possible query of interaction points. That is, if $n$ interaction points were identified, we evaluated $2^n$ minimum shapes. Selecting subsets of interaction points reduces the dependency of the query on the full shape of the reference molecule and more closely mimics the use case where the shape constraints are manually constructed by an expert to select only the most important interactions. As each shape constraint query is a filter, it does not return a total ranking of database compounds, so we report the true positive and false positive rate of the selected subset. We calculate a p-value for the returned subset (relative to the null hypothesis of random selection) using a hypergeometric test. Reported p-values are corrected for multiple comparisons using the Bonferroni correction (the calculated p-value is multiplied by number of comparisons). Shape similarity results are reported using the area under the curve (AUC) of the receiver operating characteristic (ROC) curve. This metric is relatively insensitive to the number of compound and has well-developed statistical properties \cite{Jain_2007}.