twchrist edited Feb 21 pre meeting update.tex  about 10 years ago

Commit id: 88f215a55383472702687edf8a0099bc7c96f0bb

deletions | additions      

       

\begin{document} \Section{Feb 20 pre-meeting update} \subsection*{phenoscape} \begin{verbatim} So as of the time of writing Jim is still compiling a nice little starter package for me so I can learn how phenoscape is set up and how to use OWL to access it in a way that best suits my project. Also Prishanti will be giving me scripts after the lab meeting. I also have been reading up on OWL and their main tutorial, though it is a bit vaugue. I'm thinking I will get a lot more help from Jim's scripts since they will have examples specific to my research. \end{verbatim} \subsection*{ensembl access} \begin{verbatim} Every now and again I mess with Ensembl's API to learn about gene trees or things like that but my scripts are always very slow so I got into contact with Steven Fishback, one of the guys who oversees killdevil. We did a bit of brain storming and now I have several ways that I could speed up data retrieval. I'm trying one of retrieval.The current ideas are:  - download ENSEMBL database to killdevil  - run scripts from the login node  - use useastdb.ensembl.org as the host rather than ensembldb.ensembl.org  I did not attempt the first idea however running scripts on the login node does seem to speed  them now and it up, however, that prevents running mulitple scripts in parallel on the queues. Changing the  host also  seems to be helping.   I'm pretty excited about very effective. However, I have had sporadic issues with  the idea of future Ensembl data retrieval being less of a hassle. useastdb not  connecting properly and reporting that it can't find very simple databases. When this occurs  I switch back to the traditional ensembldb host.  \end{verbatim} \subsection*{teleost duplication} \begin{verbatim} I think I have found some good papers/datasets. This paper http://genome.cshlp.org/content/13/3/382.full is specifically about identifying ~50 paralog pairs in zebrafish from the teleost duplication. Table 1 lists all of these genes. This seems like a great test set that I could use for my first forrays into phenoscape. I also found http://genomebiology.com/2006/7/5/R43 which is a very good over view of the three WGD in vertebrates, with special focus on the teleost WGD. Judging by the experiments they perform and their methods, they almost certainly have a very simple way of determining which paralogs are a result of the teleost WGD, or have that dataset laid out already. However that data is not posted online. The paper states that if I want the data I would have to email them. So now the question is do I take my chances with them and hope they respond promptly or do I keep looking the see if their is a more easily accessible dataset. Side Note: I have also found two databases that specialize in duplicated genes but they do not assign duplication to any specific time so they don't appear any more useful than Ensembl. \end{verbatim} \subsection*{clark scripts} \begin{verbatim} I went over the scripts that Clark sent us and I noted something interesting. He only works with Homologs that have over 50 percent identity. This is outlined in MAIN_homology_maker.m in the fish  folder.  Not sure if that changes anything, just something I found that seemed noteworthy. \end{verbatim} \end{document}