Authorea

Introduction

Advances in sequencing technology have produced an avalanche of biological data over the past 12 years. The bottleneck in discovery has consequently shifted from data generation to data analysis, suggesting that much data is not used to its full potential \cite{Lockhart_Winzeler_2000}.

Crowdsourcing is one technique to gain more insight from existing biological data. Putting the diverse eyes and hands of the general public to the purpose of bioinformatics is not new \cite{Good_Su_2013} \cite{ld_Allison_Bonneau_et_al__2012}; examples include protein \cite{lane2012milliseconds} and RNA folding [http://eterna.cmu.edu/web/], and both paid (Ingenuity® Systems, www.ingenuity.com) and unpaid \cite{hingamp2008metagenome} curation of literature.

Rather than approach a problem strictly as professionals, we developed an Open Source DIY workshop where scientists and the public worked together to tackle a synthetic biology project resulting in a publishable outcome. The problem to be solved would need data from completely open sources and not require difficult analysis. A modest goal was set to do a survey of plant translation initiation motifs, aiming to create an open source parts list for controlling translation in metabolic engineering and synthetic biology. Working meetings were posted through Counter Culture Labs and Berkeley Bio Labs (groups with >100 members each) on meetup.com and met every week or two over three months.

Plants offer many advantages as systems to do fine-tuned biological engineering [e.g., modification to enhance production of economically valuable terpinoid \cite{moses2013bioengineering}, modification of lignin biosynthesis to expedite biofuel synthesis \cite{li2008improvement}]. There is a paucity of published information, however, on how to control sets of genes working in concert. Use of small sequence motifs as ribosome binding site parts for synthetic biology has been proposed in bacteria [ \cite{Salis_Mirsky_Voigt_2009} see also: http://parts.igem.org/Ribosome_Binding_Sites/Prokaryotic/Constitutive/Anderson.] and similar parts have been produced for yeast [parts.igem.org]. Estimates for RBS parts in prokaryotic systems show that the translation level of a gene can be shifted by greater than an order of magnitude, indicating their potential utility in synthetic biology projects. Generating an estimate of the regulatory power of plant translation initiation motifs was thus seen as a useful goal for our project.

In most eukaryotic plant genes the 5' cap of the mRNA transcript acts as the ribosome binding site and the Kozak sequence acts as the signal for translation initiation. Due to the bacterial origins of the chloroplast, transcripts of genes encoded within the chloroplast genome contain distinct consensus sequences in comparison to transcripts from the nucleus. Instead of the 5' cap there is a short motif called the Shine-Delgarno sequence where the ribosome binds and then initiates translation, generally 8 nucleotides downstream, though this distance varies. Although there has been some experimental work on ribosome binding sites and Kozak sequences in plants [refs, perhaps Lutcke et al EMBO J 1987], genomic-scale surveys have not been performed.

Here we use publicly available, combined RNA- and protein expression data for both nuclear and chloroplast genes to estimate the power of the ribosome binding and translation initiation sequence motifs to initiate translation. These are initial results; experimental confirmation of the motifs will follow.