twchrist edited Feb 21 pre meeting update.tex  about 10 years ago

Commit id: c00787f1b30bf3ba36dcc7d754b3becd244efb54

deletions | additions      

       

\Section{Feb 20 pre-meeting update}  \subsection*{phenoscape}  \begin{verbatim}  So as of the time of writing Jim is still compiling a nice little starter package for me   so I can learn how phenoscape is set up. Also Prishanti will be giving me scripts after   the lab meeting. I also have been reading up on OWL and their main tutorial though it   is a bit vaugue. I'm thinking I will get a lot more help from Jim's scripts since they   will have examples specific to my research.  \end{verbatim}  \subsection*{ensembl access}  \begin{verbatim}  Every now and again I mess with Ensembl's API to learn about gene trees or things like that   but my scripts are always very slow so I got into contact with Steven Fishback, one of the   guys who oversees killdevil. We did a bit of brain storming and now I have several ways   that I could speed up data retrieval. I'm trying one of them now and it seems to be helping.   I'm pretty excited about the idea of future Ensembl data retrieval being less of a hassle.  \end{verbatim}  \subsection*{teleost duplication}  \begin{verbatim}  I think I have found some good papers/datasets. This paper   http://genome.cshlp.org/content/13/3/382.full is specifically about identifying ~50 paralog pairs   in zebrafish from the teleost duplication. Table 1 lists all of these genes. This seems like a   great test set that I could use for my first forrays into phenoscape.     I also found http://genomebiology.com/2006/7/5/R43 which is a very good over view of the three   WGD in vertebrates, with special focus on the teleost WGD. Judging by the experiments they perform   and their methods, they almost certainly have a very simple way of determining which paralogs   are a result of the teleost WGD. However that data is not posted online. The paper states that if   I want the data I would have to email them. So now the question is do I take my chances with them   or do I keep looking the see if their is a more easily accessible dataset.     Side Note: I have also found two databases that specialize in duplicated genes but they do not   assign duplication to any specific time so they don't appear any more useful than Ensembl.  \end{verbatim}  \subsection*{clark scripts}  \begin{verbatim}  I went over the scripts that Clark sent us and I noted something interesting. He only works with   Homologs that have over 50 percent identity. Not sure if that changes anything, just something   I found that seemed noteworthy.  \end{verbatim}  \end{document}