Authorea

Michael Walker edited Problems.tex almost 9 years ago

Commit id: f1b3bfc50fbd3219bb201b57b8977f2394edff64

deletions | additions

%\section{These problems were tackled at HealthHack 2013} \subsection{Agood visualisation tool for literature research!} research} A Browsing the existing literature is a fundamental task in biomedical research and clinical practice is to browse the existing literature. practice. One group developed a simple online tool which visually displays the results of a literature search in a manner that takes into account both the ranking of the references as well as their similarity to one another. Such a tool would enable researchers and clinicians to more efficiently browse the literature to not only find relevant articles but also gain an overview of a given research field. While several literature search engines already existed, they rarely rank the papers in order of relevance based on the full text. The

Large graphs arise commonly in bioinformatic analysis. Examples include inferring gene regulatory networks, and DNA sequences for genome assembly. These graphs are generally too large to lay out and visualize interactively, having tens of thousands to potentially billions of nodes. Therefore, one team developed a visualization to interactively visualize a restricted neighbourhood of such a graph. \subsubsection{Option 1:} \subsubsection{Genome assembly and k-mer graph:} One instance in which this would be useful is the analysis of sequence variants represented in a localised area of a genome assembly. Genome sequencing of a population of cells results in hundreds of millions of sequence “reads”. These individual reads are put together based upon the presence Genome sequencing often makes use ofoverlapping substrings. One approach to solving this problem is to create a k-mer graph. graphs, used to identify overlapping substrings of length $k$. For some value of k (smaller than the read length) $k$ a graph is created, where each node represents a string of length k (a k-mer) present in one or more reads. A directed edge exists where two k-mers share an overlap of k-1 characters (for example CATT -> ATTG). In this way, a read may be represented as a path through the graph, and the reconstructed genome sequence is (ideally) encoded as a longer path through the same graph. In a cancer sample several different DNA variations may be present, particularly at clinically relevant positions of the genome, arising from the clonal structure of the population of cells. The presence of a sequence variation – such as the replacement, insertion, or deletion of nucleotides – will create alternative paths in a localised region of this graph, and visualisation of these alternatives may present a means for helps researchersto understand the spectrum of variants at a given locus. The aim of this project is to interactively construct and display, given a seed genomic region of interest, the structure of the k-mer graph implied by a set of sequence reads in such a way as to show clearly the variants present, with a visual indication of their frequency of occurrence (determined by the number of reads that indicate their presence). \subsubsection{Option 2:} \subsubsection{Application to Gene Networks:} Gene regulatory networks, particularly those containing indirect interactions, are complex to visualise both because of their size, and because they generally do not have a planar embedding.