David Koes edited subsection_Dataset_The_specific_modality__.tex  over 8 years ago

Commit id: ee07ec64b45139fa37133c288366d467040c0c76

deletions | additions      

       

\subsection*{Dataset}  The specific modality of fragment-oriented molecular shapes requires the creation of a custom benchmark for assessing the virtual screening performance of the method. In order to construct the shape constraints constraints,  a receptor-ligand structure is required, and, in order to screen a library, all compounds in the library must contain the desired anchor fragment. We use the Maximum Unbiased Validation (MUV) dataset\cite{Rohrer2009} as a starting point. MUV includes sets of 30 active and 15000 decoy compounds for each of 17 targets. Compounds are selected from PubChem bioactivity data to using a methodology that  both reduce reduces  the similarity of actives (which introduces (to avoid  analogue bias\cite{Good2008}) and increase increases  the similarity between actives and decoys (which helps prevent artificial enrichment\cite{Verdonk2004}). MUV is also noteworthy in that the decoys were all assessed to be inactive in the initial high-throughput screen against the target providing some measure of experimental evidence against the inclusion of false negatives. Of the 17 targets in the MUV dataset, we identified 10 that had a receptor-ligand structure in the Protein Data Bank (PDB) where the ligand had sub-$\mu$M affinity. The interaction diagrams of these structures are shown in Figure~\ref{targets}. For each of these structures we identified interacting fragments that could potentially serve as anchor fragments. For each target we consider a variety of fragments in order to evaluate the sensitivity of the approach to the choice of fragment. We selected relatively generic functional groups (at most \textbf{X} atoms) that were sufficiently common among both the actives and decoys to yield meaningful results and that were clearly forming interactions with the receptor.