auastro deleted Problems2.tex  almost 10 years ago

Commit id: 3ff02a891c30986dbafb61249b9b8170715d3944

deletions | additions      

         

\subsection{Here is a bunch of problems that geneticists would love help with!}  Inside each cell in our body are more than 30,000 genes (the ‘genome’) that are expressed in different levels to control a wide range of processes. An example of this may be the differentiation of blood cells: a single blood stem cell in the bone marrow can give rise to many different mature blood cells, progressively developing from progenitors with the potential to become many different cells, to mature cells with very specialised functions. What key genes are turned on and off at various stages of differentiation? Can we associate functions of these cells with their genomic profiles?  Using whole genome techniques to measure the expression levels of all the genes in a cell, we can gather expression values of all the genes across a variety of cell types as a matrix of values. Then we can analyse this data in multiple ways to obtain biological meaning and generate hypothesis. As a simple example, suppose that a gene is expressed highly only in a single cell type out of many. Perhaps this implies that this gene is crucial in creating these cells, and blocking the action of this gene may lead to therapeutic outcomes if too many of these cells cause disease. Hence we need tools to be able to look at this data in different ways. We also want to empower the biologists without programming background to carry out some analysis in intuitive ways. So here are some questions which may have tractable answers over a weekend of coding, based on a matrix of gene expression values:  \begin{enumerate}  \item Use d3 (javascript) to plot expression profile of a gene in various ways – eg. quickly switch between bar plots and box plots, change presentation based on the “aggregation” metric (sum over strains or mutation status, for example). Input could simply be a dictionary of values corresponding to one row of the matrix mentioned above.  \item Many genes work together to create a phenotype, and there are many ways of grouping genes into “networks”. Some common ways of creating gene networks also create visualisation challenges, due to the sheer number of possible points and connections. In this context, evaluate some commonly used network visualisation tools such as a minimum spanning tree diagram on iOS and other tablet and handheld platforms. Input could be a graph which comprises of a set of nodes and edges. What can and can’t the tablet devices do compared to the mouse/keyboard combinations?  \item Correlation calculations are often performed on subsets of the matrix mentioned above. The high number of possible subset combinations make such calculations often quite slow. What strategies could be employed to speed up the performances these calculations?  \item Clustering is often performed on genes, but not very often on cells. However for large datasets, clustering on cells is very useful for the biologist. What ideas could we explore to visualise clusters of related cells based on quantitative measures such as correlations?  \item Heatmaps are often used to show clusters in a matrix of data, but they often have limited usefulness. Can we come up with some new ways of creating heatmaps?  \item Many of the bioinformatics methods are developed in R and deployed as R packages. What are some practically useful implementation strategies for deploying applications, which can provide the user with interface tools such as d3, while leveraging R as the analysis engine?  \end{enumerate}         

Gene Machine.md  Problems Title.tex  Problems1.tex  Problems2.tex