This collaborative document has been created for the panel discussion on “Rotation in massive stars” (FOE 2015), held on Thursday 6/4/2015 in Raileigh. All conference participants have been added to the document and can edit / comment / add figures (just drag&drop) / references and even LaTeX equations if needed (check the help page for more info on how to edit the document). Hopefully this will capture the essential ideas and interactions that will stem during and after the discussion. The document can be forked at any time, so that particular discussions can be taken further and potentially lead to active collaborations.
RNA viruses are challenging for protein and nucleotide sequence based methods of molecular evolutionary analysis because of their high mutation rates and complex secondary structures. With new DNA and RNA sequencing technologies, viral sequence data from both individuals and populations are becoming easier and cheaper to obtain. Thus, there is a critical need for methods that can identify alleles whose frequencies change over time or due to a treatment. We have developed a novel statistical approach for identifying evolved nucleotides and/or amino acids in a viral genome without relying on sequence annotation or the nature of the change. Instead it identfies nucleotides that have similar patterns of change. Our approach models allelic variances under a Bayesian Dirichlet mixture distribution. With a multi-stage clustering procedure we have developed an efficient clustering scheme that distinguishes treatment causal changes from variation within viral populations. Our method has been applied to a longitudinal time-sampled influenza A H1N1 virus strain in either the absence of presence of oseltamivir in replicated experiments. We find three genomic locations with strong evidence of treatment effect and a list of sites with high genetic variation in the untreated environment. We believe our approach can be broadly applied and is particularly useful for the cases that are recalcitrant to traditional sequence analysis.
The following is just a rough list of my immediate and stretch goals for the upcoming project: PRIMARY GOALS - use ensembl compara to determine orthologs and paralogs for zebra fish and mouse - Stick to pipeline outlined by Vilella et al. paper - use Gene ontology to obtain Biological process and Molecular function info for mouse and zebrafish? - use same cutoffs to include only experimentally inferred annotations - rework clark code and then use it on my data set - create similar graphs and compare results to clark paper - my theory: a purely mouse to zebrafish comparison should eliminate the experimental bias found in human vs mouse since mouse and zebrafish can be used for more similar experiments STRETCH GOALS - Find RNA seq data to work with (if its already out there) as a further check - Fully eliminate authorship bias - normalize measures of function similarity with respect to background similarity - estimate frequencies of GO terms separately for each species? - Find a way to incorporate phenoscape data into comparison - find good source of similar data for mice - figure out how to accurately and consistently compare features in an automated fashion GOAL CHANGES The above goals were created in early January 2014, they changed during the course of the project. The final goals, set around early february, were: - Obtain a sample set of genes that relate a mouse ortholog to a set of zebrafish paralogs that resulted from the teleost duplication - obtain a full set of that data (possibly from Yves Van De Peer) - use phenoscape to obtain ontological annotations for each gene - use scripts from Prishanti to calculate the functional similarity between orthologs and each paralog set, as well as between the paralogs.