Jianguo Xia

and 1 more

The initial motivation for developing MetaboAnalyst was to save time for myself. I started my PhD with Dr. David Wishart at the University of Alberta. During that time period, the main focus of the lab was, of course, the Human Metabolome Database (HMDB). The development of a metabolomics core facility was also at its full speed. As part of my PhD training, I was involved in a metabolomics study on urine samples from cancer cachexia patients. At that time, the only bioinformatics tool for metabolomics data analysis was a commercial software - SIMCA-P (Umetrics). We purchased a copy of the tool which came with a comprehensive manual. Although I could perform some “standard” data analysis to produce the numbers and graphics as seen in many metabolomics publications, I soon realized its limitations - many approaches I would like to try were not supported. I then played with Weka (https://www.cs.waikato.ac.nz/ml/weka/), a widely-used java-based machine learning tool, for classification and regression analysis. However, it lacks many features specially needed for metabolomics data analysis. In the end, I taught myself R to perform data analysis. This worked well for a short time - I analyzed the data the way I wanted, generated impressive graphics, and produced analysis reports using Sweave & Latex. However, the process soon became less enjoyable when more collaborators requested their data to be analyzed in a similar fashion. A better way is to let someone else in the lab do it. The best way is to let researchers analyze their own data - most of them are highly educated and understand the basic principles behind most analysis methods. At that time, I was the only one in the lab who knew R and statistics - how can I let other people with some basic knowledge to perform the same analysis I would do? In 2008, I started thinking seriously about developing a biologist-friendly tool for metabolomics data analysis. One of the advantages of being last in the “omics” race is the benefit of hindsight. Many of the approaches developed from other omics fields are not domain-specific and can be adapted for metabolomics. For instance, the GenePattern tool suite \citep{Reich_2006} developed by the Broad Institute gave me a lot of inspirations. Other important considerations include - be web-based, respond at real time, and be implemented in the languages I know (Perl, Java and R). During a lab meeting in the summer of 2008, I proposed this idea to David. He was a bit uncertain as he knew that I had no formal training in developing web based applications (note: I obtained my MSc in Immunology after I graduated from a 5-yr Medicine program). I was very enthusiastic and said I could get this done by the end of year. He smiled and encouraged me to pursue in this direction. As most analysis methods and graphics were already implemented in R, the key challenge was to put these functions on the web through user-friendly interface. I wanted to use a technology that will not expire soon. The Perl CGI based web framework was losing its ground at that time. Java had a lot to offer in terms of web frameworks. However, many of them are too “heavy” for me to learn in a short time. Eventually, I chose the then relatively new JavaServer Faces (JSF) technology. The next technical challenge was how to efficiently communicate between R and Java to deal with concurrency (i.e. supporting multiple users to perform data analysis at the same time). The Rserve (https://www.rforge.net/Rserve) developed by Simon Urbanek came to my rescue. I spent around three months to complete the first prototype, which captured all the steps I would do for metabolomics data analysis. The web interface was designed to be quite “conversational” and acted as a playground to allow users to freely explore many useful statistical analysis methods once their data parse certain sanity checking, processing and normalization. MetaboAnalyst (version 1.0) was published in 2009 at Nucleic Acids Research \citep{Xia_2009}. It enables a researcher with a basic understanding of metabolomics and statistics to perform data analysis to generate a comprehensive analysis report. It was also heavily used by other members within our metabolomics group and saved a lot of my time. My next focus was on functional analysis of metabolomics data. Using the same infrastructure, I developed tools for metabolite set enrichment analysis \citep{Xia_2010}, metabolomic pathway analysis \citep{12235}, as well as time-series data analysis \citep{Xia2011}. They were eventually merged under the umbrella of MetaboAnalyst (version 2.0) for the ease of use and the convenience of maintenance \citep{Xia_2012}. While I was pursuing my PhD on bioinformatics for metabolomics, the next-generation sequencing revolution was in full swing. In 2012, I received two postdoctoral fellowships from the Canadian Institutes of Health Research (CIHR) and Killam Trust, to work on next-generation sequencing in Bob Hancock’s laboratory at the University of British Columbia (UBC). While at UBC, MetaboAnalyst was gaining steady increase in user traffics, and I felt obligated to maintain MetaboAnalyst and to keep addressing user requests. For instance, I added a biomarker analysis module to support a variety of common approaches clinicians would like to perform. With growing popularity, there were signs of performance issue - many colleagues experienced significantly slow responses when they used MetaboAnalyst for teaching in a large class.  I eventually decided to totally re-implement the software, with particular focus on addressing the performance bottlenecks in both Java and R functions. I also switched to the Google Computer Engine (GCE) for hosting the web application. The result is MetaboAnalyst 3.0 \citep{Xia_2015}. The impact of this update turned out to be very significant. Google Analytics showed that the submitted analysis jobs jumped from 500~800 jobs/day to 5000~8000 jobs/day, and the server downtime was also reduced significantly. We are actively developing MetaboAnalyst 4.0 at the time of writing. The key features will be to enable more transparent & reproducible analysis, better support for untargeted metabolomics, and integration with other omics through advanced statistics and network analysis.

Richard Frankham

and 1 more

The critical event that eventually led to the first of my meta-analysis papers on genetic rescue occurred in February 2007 at a book writing session on the second edition of “Introduction to Conservation Genetics” \citep{Frankham} at Jonathan Ballou’s house in the Washington, D.C. area. Upon reaching the topic of outbreeding depression (where the effects of crossing populations results in harmful fitness effects in the progeny), we both expressed serious disquiet that the risks of outbreeding depression were being overplayed, while the potential fitness benefits of crossing (genetic rescue) were largely being ignored. One of us said “we must be able to predict the risk of outbreeding depression”. A few days later inspiration struck and we had the key to doing this: harmful effects on fitness of crossing populations typically arise when the crossed populations have fixed chromosomal differences, and/or are adapted to different environments. We subsequently recruited Katherine, Ralls, Mark Eldridge, Michele Dudash, Charles Fenster and Robert Lacy and jointly transformed this insight into a paper that was published in Conservation Biology \cite{FRANKHAM_2011}. That work was critical to the ability to use genetic rescue (variously called outcrossing or augmentation of gene flow) as a tool to save small inbred population fragments from extinction, and thereby reduce population and species extinction risks. As genetic rescue had been attempted in very few cases, we decided to write a book on “Genetic Management of Fragmented Animals and Plant Populations” in an attempt to create a paradigm shift where the discovery of genetically differentiated populations was followed, not by the conclusion that separate management of fragments was required, but by asking if any of the populations were suffering genetic erosion (inbreeding, loss of genetic variation, reduced fitness, reduced ability to evolve and elevated extinction risk), and if so, was a genetic rescue attempt justified. I drafted Chapter 6 on Genetic rescue for the book, and then decided that it needed some examples which were put into a Table. At this point, I finally recognized that a fully-fledged meta-analysis was required, as there was no overview of the effects of outcrossing in a conservation context, i.e. when an inbred population fragment with low genetic diversity was crossed to another population and where the risk of outbreeding depression in the resulting progeny was low. The meta-analysis was done without external research funding as I have been officially retired since 2002 (but am still scientifically active) and do not have grant money for any of the work described here. I am great fan of meta-analyses: not only can they be done without research funds, but they are typically highly cited, similar to reviews, and are superior scientifically to them. By mining the literature, I found 156 relevant comparisons of inbred parents and their outcrossed progeny, and 145 had beneficial effects on fitness. Only one of the cases where crossing was harmful was a convincing case of outbreeding depression (in a selfing nematode), the others likely being chance observations due to low statistical power. The median fitness benefit from augmenting gene flow was 148% in wild/stressful conditions and 45% in benign/captive ones. Consequently, there are huge potential benefits from augmenting gene flow into population fragments suffering from genetic erosion, provided the risk of outbreeding depression in proposed crosses is low. Thus, the two main impediments to genetic rescue attempts have been removed. This paper was published in Molecular Ecology \cite{Frankham_2015} (currently 123 citations in Google Scholar), and was accompanied by a commentary from Donald Waller \cite{Waller_2015}. He praised the paper, but was not convinced about the persistence of the benefits over generations. Consequently, I did further analyses on my database to compare the effects of crossing on fitness in the F1, F2 and F3 generations and this confirmed that the benefits persisted to an extent that was, if anything, better than expected. This led to the publication of a second genetic rescue meta-analysis paper in Biological Conservations \cite{Frankham_2016}. Writing of our book continued (with Paul Sunnucks being added as another author) and it was submitted to Oxford University Press in December 2016. However, during the subsequent copy editing I realised that the second genetic rescue meta-analysis paper was incomplete, as the persistence of fitness benefits following crossing is expected to depend on the breeding system. Persistence of fitness benefits across generations is expected for outbreeders, but habitual selfing after crossing will lead to loss of benefits, while mixed mating species should experience only partial persistence of fitness benefits. I subsequently extended the analyses of my database from F3 to F13 and found no significant decline in fitness benefits for outbreeding species. Further, \citet{Bijlsma_2010} found no significant change in fitness between F10 and F15 generations in outbreeding Drosophila flies. The updated findings were included in the published version of our “Genetic Management of Fragmented Animal and Plant Populations” book \citep{Frankham_2017}. This was followed by a related paper calling for a paradigm shift in the genetic management of fragmented populations \citep{Ralls_2017}.