David Koes edited subsection_Dataset_The_specific_modality__.tex  over 8 years ago

Commit id: 2ad775aaf5f58b478846a78f2cea07f2716ee336

deletions | additions      

       

The specific modality of fragment-oriented molecular shapes requires the creation of a custom benchmark for assessing the virtual screening performance of the method. In order to construct the shape constraints, a receptor-ligand structure is required, and, in order to screen a library, all compounds in the library must contain the desired anchor fragment.   We use the Maximum Unbiased Validation (MUV) dataset\cite{Rohrer2009} as a starting point. MUV includes sets of 30 active and 15000 property-matched  decoy compounds for each of 17 targets. Compounds are selected from PubChem bioactivity data using a methodology that both reduces the similarity of actives (to avoid analogue bias\cite{Good2008}) and increases the similarity between actives and decoys (which helps prevent artificial enrichment\cite{Verdonk2004}). MUV is also noteworthy in that the decoys were all assessed to be inactive in the initial high-throughput screen against the target providing some measure of experimental evidence that the negatives are true negatives (as opposed to schemes that generate decoys through random sampling\cite{Mysinger_2012}). Of the 17 targets in the MUV dataset, we identified 10 that had a receptor-ligand structure in the Protein Data Bank (PDB) where the ligand had sub-$\mu$M affinity. The interaction diagrams of these structures and their PDB codes are shown in Figure~\ref{targets}. For each of these structures we identified interacting fragments that could potentially serve as anchor fragments. For each target we consider a variety of fragments in order to evaluate the sensitivity of the approach to the choice of fragment. We selected relatively generic functional groups (at most 7 atoms) that were sufficiently common among both the actives and decoys to yield meaningful results and that were clearly forming interactions with the receptor.