Authorea

twchrist edited Feb 21 pre meeting update.tex about 10 years ago

Commit id: c00787f1b30bf3ba36dcc7d754b3becd244efb54

deletions | additions

\Section{Feb 20 pre-meeting update} \subsection*{phenoscape} \begin{verbatim} So as of the time of writing Jim is still compiling a nice little starter package for me so I can learn how phenoscape is set up. Also Prishanti will be giving me scripts after the lab meeting. I also have been reading up on OWL and their main tutorial though it is a bit vaugue. I'm thinking I will get a lot more help from Jim's scripts since they will have examples specific to my research. \end{verbatim} \subsection*{ensembl access} \begin{verbatim} Every now and again I mess with Ensembl's API to learn about gene trees or things like that but my scripts are always very slow so I got into contact with Steven Fishback, one of the guys who oversees killdevil. We did a bit of brain storming and now I have several ways that I could speed up data retrieval. I'm trying one of them now and it seems to be helping. I'm pretty excited about the idea of future Ensembl data retrieval being less of a hassle. \end{verbatim} \subsection*{teleost duplication} \begin{verbatim} I think I have found some good papers/datasets. This paper http://genome.cshlp.org/content/13/3/382.full is specifically about identifying ~50 paralog pairs in zebrafish from the teleost duplication. Table 1 lists all of these genes. This seems like a great test set that I could use for my first forrays into phenoscape. I also found http://genomebiology.com/2006/7/5/R43 which is a very good over view of the three WGD in vertebrates, with special focus on the teleost WGD. Judging by the experiments they perform and their methods, they almost certainly have a very simple way of determining which paralogs are a result of the teleost WGD. However that data is not posted online. The paper states that if I want the data I would have to email them. So now the question is do I take my chances with them or do I keep looking the see if their is a more easily accessible dataset. Side Note: I have also found two databases that specialize in duplicated genes but they do not assign duplication to any specific time so they don't appear any more useful than Ensembl. \end{verbatim} \subsection*{clark scripts} \begin{verbatim} I went over the scripts that Clark sent us and I noted something interesting. He only works with Homologs that have over 50 percent identity. Not sure if that changes anything, just something I found that seemed noteworthy. \end{verbatim} \end{document}