David Koes edited subsection_Dataset_The_specific_modality__.tex  over 8 years ago

Commit id: 4c3ad2b0c0aeffb6cf9e2293e10a32d6bffd10ca

deletions | additions      

       

We use the Maximum Unbiased Validation (MUV) dataset\cite{Rohrer2009} as a starting point. MUV includes sets of 30 active and 15000 decoy compounds for each of 17 targets. Compounds are selected from PubChem bioactivity data to both reduce the similarity of actives (which introduces analogue bias\cite{Good2008}) and increase the similarity between actives and decoys (which helps prevent artificial enrichment\cite{Verdonk2004}). MUV is also noteworthy in that the decoys were all assessed to be inactive in the initial high-throughput screen against the target providing some measure of experimental evidence against the inclusion of false negatives.  Of the 17 targets in the MUV dataset, we identified 10 that had a receptor-ligand structure in the Protein Data Bank (PDB) where the ligand had sub-$\mu$M affinity. The interaction diagrams of these structures are shown in Figure~\ref{targets}. For each of these structures we identified interacting fragments that could potentially serve as anchor fragments. The selection For each target we consider a variety  of fragments in order to evaluate  the anchor fragment for each complex is necessarily dependent upon sensitivity of  the input dataset (the fragment must be approach to the choice of fragment. We endeavored to select relatively generic functional groups that were  sufficiently common to among  both the actives and decoys). However, we chose decoys  to not define a custom anchor fragment for each complex that matched the greatest number of actives while rejecting the most decoys. Instead, we identified a smaller set of shared anchor fragments representing common chemical groups. This approach is more in-line with our expectation that prospective applications of fragment oriented shape matching will, at least at first, use pre-built libraries anchored on common fragments. yield meaningful results.  Each selected fragment defines a different subset for a target that consists of only those compounds that contain the fragment.