Molecular tools for synthetic biology in plants: a first generation open bioinformatics workshop.

Authors Ron Shigeta, Niranjana Nagarajan, Shriram Bharath, Wilifred Tang, Tony Hecht, Alex Alekseyenko, Bryce Wolfe, Corey Hudson, Jamey Kain, Urvish Parikh, Scott Fay, Kyle Taylor

Loosely sponsored by Counter Culture Labs and Berkeley Bio Labs

Please address correspondence to


Synthetic biology has had profound effects on human life. It has provided more effective anti-malarial medicine, cheaper insulin, new useful bio-materials, and greener biofuels. However, much remains to be learned in order to synthesize proteins more efficiently. To explore the potential of the DIY biology movement to engage in meaningful synthetic biology bioinformatics research, we developed a bioinformatics workshop to study determinants of protein expression levels in plants. We extracted possible ribosome binding and translation initiation sequences and looked for correlations with experimentally determined protein levels, using publicly available data sets for the widely studied plants Oryza sativa and Arabidopsis thaliana. The working group was open to the public and met every other week for 3 hours, typically starting with a short, relevant presentation followed by hands-on data work. We aim to develop, experimentally validate, and publish our consensus sequences, anticipating that our work will be useful for plant synthetic biology research. We hope our experience will serve as a model for future community projects that serve the dual purpose of educating curious members of the public while also generating useful scientific results.


Advances in sequencing technology have produced an avalanche of biological data over the past 12 years. The bottleneck in discovery has consequently shifted from data generation to data analysis, suggesting that much data is not used to its full potential (Lockhart 2000).

Crowdsourcing is one technique to gain more insight from existing biological data. Putting the diverse eyes and hands of the general public to the purpose of bioinformatics is not new (Good 2013) (Marbach 2012); examples include protein (Lane 2012) and RNA folding [], and both paid (Ingenuity® Systems, and unpaid (Hingamp 2008) curation of literature.

Rather than approach a problem strictly as professionals, we developed an Open Source DIY workshop where scientists and the public worked together to tackle a synthetic biology project resulting in a publishable outcome. The problem to be solved would need data from completely open sources and not require difficult analysis. A modest goal was set to do a survey of plant translation initiation motifs, aiming to create an open source parts list for controlling translation in metabolic engineering and synthetic biology. Working meetings were posted through Counter Culture Labs and Berkeley Bio Labs (groups with >100 members each) on and met every week or two over three months.

Plants offer many advantages as systems to do fine-tuned biological engineering [e.g., modification to enhance production of economically valuable terpinoid (Moses 2013), modification of lignin biosynthesis to expedite biofuel synthesis (Li 2008)]. There is a paucity of published information, however, on how to control sets of genes working in concert. Use of small sequence motifs as ribosome binding site parts for synthetic biology has been proposed in bacteria [ (Salis 2009) see also:] and similar parts have been produced for yeast []. Estimates for RBS parts in prokaryotic systems show that the translation level of a gene can be shifted by greater than an order of magnitude, indicating their potential utility in synthetic biology projects. Generating an estimate of the regulatory power of plant translation initiation motifs was thus seen as a useful goal for our project.

In most eukaryotic plant genes the 5' cap of the mRNA transcript acts as the ribosome binding site and the Kozak sequence acts as the signal for translation initiation. Due to the bacterial origins of the chloroplast, transcripts of genes encoded within the chloroplast genome contain distinct consensus sequences in comparison to transcripts from the nucleus. Instead of the 5' cap there is a short motif called the Shine-Delgarno sequence where the ribosome binds and then initiates translation, generally 8 nucleotides downstream, though this distance varies. Although there has been some experimental work on ribosome binding sites and Kozak sequences in plants [refs, perhaps Lutcke et al EMBO J 1987], genomic-scale surveys have not been performed.

Here we use publicly available, combined RNA- and protein expression data for both nuclear and chloroplast genes to estimate the power of the ribosome binding and translation initiation sequence motifs to initiate translation. These are initial results; experimental confirmation of the motifs will follow.