ROUGH DRAFT authorea.com/10301

# David LeBauer, Moein Azimi, David Bettinardi, Rachel Bonet, Emily Cheng, Michael Dietze, Patrick Mulrooney, Scott Rohde, Andy Tu

Abstract

This is the userguide for entering data into the BETYdb database. The goal of this guide is to provide a consistent method of data entry that is transparent, reproducible, and well documented. The steps here generally accomplish one of two goals. The first goal is to provide data that is associated with the experimental methods, species, site, and other factors associated with the original study. The second goal is to provide a record of all the transformations, assumptions, and data extraction steps used to migrate data from the primary literature to the standardized framework of the database.

• Getting Started \ref{sec:getting_started}
• Preparing Publications \ref{sec:preparing_publications}
• Citations \ref{sec:citation}
• Site \ref{sec:site}
• Treatments \ref{sec:treatments}
• Managements \ref{sec:managements}
• Traits \ref{sec:traits}
• Yields \ref{sec:yields}
• QA/QC \ref{sec:qaqc}

# Getting Started \label{sec:getting_started}

You will need to create the following accounts:

• BETYdb (To use the database; request "creator" access during signup to enter data; request "manager" to perform QA/QC
• Mendeley is used to track and annotate citaitons
• Google Docs is used to prepare and transform data prior to entry.
• Redmine is used to track data that need to be checked and/or corrected.

# Preparing Publications for Data Entry \label{sec:preparing_publications}

## Mendeley

Mendeley provides a central location for the collection, annotation, and tracking of the journal articles that we use. Features of Mendeley that are useful to us include:

• Collaborative annotation & notes sharing
• Text highlighter
• Sticky notes for comments in the text
• Notes field for text notes in the reference documentation
• Groups
• Tagging

Each project has two groups: "projectname" and "projectname_out" for the papers with data to be entered and for the papers with data that has been entered, respectively. Papers in the _out group may contain data for future entry (for example, traits that are not listed in Table \ref{tab:traits}).

Each project manager may have one or more projects and each project should have one group. Group names should refer to plant species, plant functional types, or another project specific name. Please make sure that David LeBauer is invited to join each project folder.

1. Open Mendeley desktop
2. Click EditNew Group or Ctrl+Shift+M
3. Create group name following instructions above
4. Enter group name
5. Set Privacy SettingsPrivate
6. Click Create Group
7. Click Edit Settings
8. Under File Synchronization, check Download attached files to group

When naming a group, tag folders so that instructions for a technician would include the folder and the tag to look for, e.g. "please enter data from projectx" or "please enter data from papers tagged y from project x". To access the full text and PDF of papers from off campus, use the UIUC VPN service. If you are managing a Mendeley folder that undergraduates are actively entering data from, please plan to spend between 15 min and 1 hour per week maintaining it - enough to keep up with the work that the undergraduates are doing.

• If the DOI number is available (most articles since 2000)
1. Select project folder
2. Right click and select Add entry manually...
3. Paste DOI number in DOI field
4. Select the search spyglass icon
5. Drag and drop PDF onto the record.
• If DOI not available:
1. Download the paper and save as citation_key.pdf
2. Add using the Files field
3. The citation key should be in authorYYYYabc where YYYY is the four digit year and abc is the acronym for the first three words excluding articles (the, a, an), prepositions (on, in, from, for, to, etc...), and the conjunctions (for, and, nor, but, or, yet, so) with less than three letters.

### Annotating a Reference

Each week, please identify and prepare papers that you would like to be entered next by completing the following steps:

1. Use the star label to identify the papers that you want the student to focus on next.

• Start by keeping a minimum of 2 and a maximum of 5 highlighted at once so that students can focus on the ones that you want. Students have been entering 1-3 papers per week, once we get closer to 3-5, the min/max should change.
• Choose papers that are the most data rich.
2. For each paper, use comment bubbles, notes field, and highlighter to indicate:

• Name(s) of traits to be collected
• Methods:
• Site name
• Location
• Number of replicates
• Statistics to collect
• Identify treatment(s) and control
• Indicate if study was conducted in greenhouse, pot, or growth chamber
• Data to collect
• Identify figures number and the symbols to extract data from.
• Table number and columns with data to collect
• Covariates
• Management data (for yields)
• Units in 'to' and 'from' fields used to convert data
• Esoteric information that other scientists or technicians might not catch and that is not otherwise recorded in the database
• Any data that may be useful at a later date but that can be skipped for now.

Comment or Highlight the following information

• Sample size
• Covariates (see table \ref{tab:covariates})
• Treatments
• Managements
• Other information entered into the database, e.g. experimental details

### Finding a citation in Mendeley

To find a citation in Mendeley, go to the project folder. By default, data entry technicians should enter data from papers which have been indicated by a yellow star and in the order that they were added to the list. Information and data to be collected from a paper can be found under the 'Notes' tab and in highlighted sections of the paper.

## Recording extracted data and transformations

Google Spreadsheets are used to keep a record of any data that is not entered directly from the original publication. Please share all spreadsheets with the user betydb@gmail.com in addition to any collaborators.

• Any raw data that is not directly entered into the database but that is used to derive data or stats using equations in Tables \ref{tab:conversions} or \ref{tab:stats}.
• Any data extracted from figures, along with the figure number
• Any calculations that were made. These calculations should be included in the cells.

Each project has a Google document spreadsheet with the title "project_data". In this spreadsheet, each reference should have a separate worksheet labeled with the citation key (authorYYYabc format). Do not enter data into excel first as this is prone to errors and information such as equations may be lost when uploading or copy-pasting.