Authorea

David Koes edited subsection_Dataset_The_specific_modality__.tex over 8 years ago

Commit id: a3554f560926c98aad76f2af1d9c7ce6766f38b6

deletions | additions

We use the Maximum Unbiased Validation (MUV) dataset\cite{Rohrer2009} as a starting point. MUV includes sets of 30 active and 15000 decoy compounds for each of 17 targets. Compounds are selected from PubChem bioactivity data using a methodology that both reduces the similarity of actives (to avoid analogue bias\cite{Good2008}) and increases the similarity between actives and decoys (which helps prevent artificial enrichment\cite{Verdonk2004}). MUV is also noteworthy in that the decoys were all assessed to be inactive in the initial high-throughput screen against the target providing some measure of experimental evidence that the negatives are true negatives (as opposed to schemes that generate decoys through random sampling\cite{Mysinger_2012}). Of the 17 targets in the MUV dataset, we identified 10 that had a receptor-ligand structure in the Protein Data Bank (PDB) where the ligand had sub-$\mu$M affinity. The interaction diagrams of these structures are shown in Figure~\ref{targets}. For each of these structures we identified interacting fragments that could potentially serve as anchor fragments. For each target we consider a variety of fragments in order to evaluate the sensitivity of the approach to the choice of fragment. We selected relatively generic functional groups (at most 7 atoms) that were sufficiently common among both the actives and decoys to yield meaningful results and that were clearly forming interactions with the receptor. \begin{table} \begin{tabular}{ c c } Fragment & SMARTS Expression \\ Cathg1 & c1ccccc1[R] \\ Cathg2 & c1ccccc1[C] \\ Cathg3 & c1ccccc1[!H] \\ Cathg4 & a1aaaaa1[!H] \\ Cathg5 & c1ccccc1[C] \\ Eralphapot1 & c1ccccc1O \\ Eralphapot2 & c1ccccc1[!H] \\ Eralphapot3 & a1aaaaa1[!H] \\ Eralpha1 & c1ccccc1O \\ Eralpha2 & c1ccccc1N \\ Eralpha3 & c1ccccc1[!H] \\ Eralpha4 & a1aaaaa1[!H] \\ Erbeta1 & c1ccccc1O \\ Erbeta2 & c1ccccc1N \\ Erbeta3 & c1ccccc1[!H] \\ Erbeta4 & a1aaaaa1[!H] \\ Fak1 & c1[c,n]cccn1 \\ Fak2 & c1cccc([!H])c1 \\ Fak3 & a1a([!H])aaaa1 \\ Fak4 & n1[c,n][c,n]cc1 \\ Fxia1 & a1aan1 \\ Fxia2 & c1[c,n]cc[c,s]1 \\ Fxia3 & c1[c,n]cc([!H])[c,s,o]1 \\ Fxia4 & a1aaaa1[!H] \\ Hivrt1 & a1aaan1 \\ Hivrt2 & c1aacn1 \\ Hivrt3 & c1aac([!H])n1 \\ Hivrt4 & a1aaaa1[!H] \\ Hivrt5 & c1ccccc1[Cl,O] \\ Hsp901 & c1ccccc1O \\ Hsp902 & c1ccccc1[!H] \\ Hsp903 & a1aaaaa1[!H] \\ Pka1 & c1[c,n]cccn1 \\ Pka2 & a1ncccn1 \\ Pka3 & a1([!H])ncccn1 \\ Pka4 & c1[c,n]c([!H])ccn1 \\ Rho1 & c1[c,n]cccn1 \\ Rho2 & c1[c,n]c([!H])ccn1 \\ \end{tabular} \end{table}