David Koes edited subsection_Dataset_The_specific_modality__.tex  over 8 years ago

Commit id: a3554f560926c98aad76f2af1d9c7ce6766f38b6

deletions | additions      

       

We use the Maximum Unbiased Validation (MUV) dataset\cite{Rohrer2009} as a starting point. MUV includes sets of 30 active and 15000 decoy compounds for each of 17 targets. Compounds are selected from PubChem bioactivity data using a methodology that both reduces the similarity of actives (to avoid analogue bias\cite{Good2008}) and increases the similarity between actives and decoys (which helps prevent artificial enrichment\cite{Verdonk2004}). MUV is also noteworthy in that the decoys were all assessed to be inactive in the initial high-throughput screen against the target providing some measure of experimental evidence that the negatives are true negatives (as opposed to schemes that generate decoys through random sampling\cite{Mysinger_2012}).  Of the 17 targets in the MUV dataset, we identified 10 that had a receptor-ligand structure in the Protein Data Bank (PDB) where the ligand had sub-$\mu$M affinity. The interaction diagrams of these structures are shown in Figure~\ref{targets}. For each of these structures we identified interacting fragments that could potentially serve as anchor fragments. For each target we consider a variety of fragments in order to evaluate the sensitivity of the approach to the choice of fragment. We selected relatively generic functional groups (at most 7 atoms) that were sufficiently common among both the actives and decoys to yield meaningful results and that were clearly forming interactions with the receptor.  \begin{table}   \begin{tabular}{ c c }  Fragment & SMARTS Expression \\   Cathg1 & c1ccccc1[R]  \\   Cathg2 & c1ccccc1[C]  \\   Cathg3 & c1ccccc1[!H]  \\   Cathg4 & a1aaaaa1[!H]  \\   Cathg5 & c1ccccc1[C]  \\   Eralphapot1 & c1ccccc1O  \\   Eralphapot2 & c1ccccc1[!H]  \\   Eralphapot3 & a1aaaaa1[!H]  \\   Eralpha1 & c1ccccc1O  \\   Eralpha2 & c1ccccc1N  \\   Eralpha3 & c1ccccc1[!H]  \\   Eralpha4 & a1aaaaa1[!H]  \\   Erbeta1 & c1ccccc1O  \\   Erbeta2 & c1ccccc1N  \\   Erbeta3 & c1ccccc1[!H]  \\   Erbeta4 & a1aaaaa1[!H]  \\   Fak1 & c1[c,n]cccn1  \\   Fak2 & c1cccc([!H])c1  \\   Fak3 & a1a([!H])aaaa1  \\   Fak4 & n1[c,n][c,n]cc1  \\ Fxia1 & a1aan1  \\ Fxia2 & c1[c,n]cc[c,s]1  \\ Fxia3 & c1[c,n]cc([!H])[c,s,o]1  \\ Fxia4 & a1aaaa1[!H]  \\ Hivrt1 & a1aaan1  \\ Hivrt2 & c1aacn1  \\ Hivrt3 & c1aac([!H])n1  \\ Hivrt4 & a1aaaa1[!H]  \\ Hivrt5 & c1ccccc1[Cl,O]  \\ Hsp901 & c1ccccc1O  \\ Hsp902 & c1ccccc1[!H]  \\ Hsp903 & a1aaaaa1[!H]  \\ Pka1 & c1[c,n]cccn1  \\ Pka2 & a1ncccn1  \\ Pka3 & a1([!H])ncccn1  \\ Pka4 & c1[c,n]c([!H])ccn1  \\ Rho1 & c1[c,n]cccn1  \\ Rho2 & c1[c,n]c([!H])ccn1  \\   \end{tabular}   \end{table}