BSHMM : A model for Markov-based DNA methylation profiling and case study in diatoms.
INTRODUCTION Host laboratory The internship took place in the Laboratory of Quantitative and Computational Biology in Paris. The lab is led by A. Carbone and is affiliated with both UPMC and CNRS. The research focuses on interdisciplinary computational biology, promoting a tight collaboration between theoretical and experimental approaches, both conducted in the same lab within seven different teams composed of biologists, computer scientists, statisticians and biophysicists. Under the supervision of Hugues Richard, I was part of the analytical genomics team whose area of research spans two main subjects : protein evolution and modelling and sequence evolution. Prior work and rationale The idea of studying methylation patterns based on a statistical method was initiated during the first year of the master’s degree as a compulsory project. The goal was to construct and implement a model inspired by the Ph.D. thesis of Bogdan Mirauta with his active help and supervision. Guillaume Viejo, a fellow student at the time and myself had to repurpose Parseq , a model aimed at RNA-Seq data analysis and modify it into a reliable DNA methylation profiling tool, starting from a library of sequencing data called BS-Seq. A 6 months voluntary internship further extended this work. Even though the main motivation of the project has been kept the same, the statistical methods have been heavily simplified : from a sophisticated Monte Carlo combined with Gibbs particle sampling into a more practical and easier 3-layer hidden Markov process of order 1, more relevant to the aspirations of an internship research project. The tool has been almost entirely implemented during this period and dubbed BSHMM for BS-Seq Hidden Markov Model. It has been proven to be effective on simulated data but no validation had been conducted in real world conditions yet. In addition, during this year, I presented a poster presenting the tool at the CJC (Jeunes Chercheur des Cordeliers) meeting, which is mainly aimed at Ph.D. students. Objectives of the Internship This second internship was an immediate follow up to the development of BHSMM. We first sought to validate our results by comparing them to those of a different methylation experiment based on microarrays to draw the 5-methylcytosine (5mC) profile of _Phaeodactylum tricornutum_. The second part consisted of using the tool that we have implemented in a larger pipeline of analysis. Recent publications have shown how methylation profiles exhibit spatial periodicity and play an important role in the chromosome arrangement inside the nucleus of some diatom species via nucleosome linkage. Besides, the same type of periodicity has been observed in the expression level of small RNAs, although it is still unclear whether these two single events are related to the same biological process. The goal is to figure out whether this periodicity is also present in _P. tricornutum_, and also if it is linked in any way to the placement patterns of small non coding RNA (snRNA) derived fragments.