Authorea

David Koes edited subsection_Dataset_The_specific_modality__.tex over 8 years ago

Commit id: 5674da4cf1c975e40f749c0cdfd834d11e11e591

deletions | additions

We use the Maximum Unbiased Validation (MUV) dataset\cite{Rohrer2009} as a starting point. MUV includes sets of 30 active and 15000 decoy compounds for each of 17 targets. Compounds are selected from PubChem bioactivity data to both reduce the similarity of actives (which introduces analogue bias\cite{Good2008}) and increase the similarity between actives and decoys (which helps prevent artificial enrichment\cite{Verdonk2004}). MUV is also noteworthy in that the decoys were all assessed to be inactive in the initial high-throughput screen against the target providing some measure of experimental evidence against the inclusion of false negatives. Of the 17 targets in the MUV dataset, we identified 10 that had a receptor-ligand structure in the Protein Data Bank (PDB) where the ligand had sub-$\mu$M affinity. The interaction diagrams of these structures are shown in Figure~\ref{targets}. For each of these structures we identified interacting fragments that could potentially serve as anchor fragments. For each target we consider a variety of fragments in order to evaluate the sensitivity of the approach to the choice of fragment. We selected relatively generic functional groups (at most \textbf{X} atoms) that were sufficiently common among both the actives and decoys to yield meaningful results and that were clearly forming interactions with the receptor. \textbf{Need table of fragments here}