Authorea

Awaiting Activation edited bulk upload.md almost 10 years ago

Commit id: 2014c4f046a6d7560f673eb610cd28ce8d9f4338

deletions | additions

1. n and SE: as noted above, if one of these is present, the other must be as well; if SE is given, the value will go into the stat column of the yields table, and the statname column will be set to "SE" 2. cultivar: use name; defaults to NULL (for the cultivar_id column) if not provided 3. notes: defaults to the empty string if not provided If a uniform value for the species is provided interactively when uploading the data set, the cultivar may be specified this way as well provided that it also has a uniform value for the whole data set. If n and SE are not given fields of the uploaded CSV file, the value of the n column of the yields table will default to 1 and the stat and statname column values will default to NULL.Automatically set attributes: id: automatically generated by the DBMS dateloc: default based on the format of the date given in the date field (alternative: select value via dropdown for entire dataset) created_at: always set to NOW updated_at: always set to NOW user_id: always set to the currently-logged-in user's id checked: always set to 0 method_id: always set to NULL Other interactive options: The user may set the number of significant digits to round to when inserting data into the "mean" and the "stat" columns (which come from the "yield" and "SE" fields of the CSV file, respectively). The default value is 4 significant digits, but the user may choose any value from 1 to 4. Note that the database does not give any indication of how many digits are significant. Outline of template-download wizard. Possible templates Here are two possible templates (given here as a list of field names) that a user wishing to upload yield data may use. If a value is set for the entire dataset, then those fields are not included in the template. ,site,species,treatment,cultivar,date,yield,n,SE,notes,access_level where may be either citation_doi or citation_author,citation_year,citation_title All templates provided will be use some subset [in rare cases the full set] of these field names Wizard steps 1. Do all the data in your data set pertain to a single citation? If yes, skip to 3. 2. Do you have doi values for all of your citations? 3. Do all of the data in your data set pertain to a single site? 4. Do all of the data in your data set pertain to a single species? 5. Did you use the same treatement for each datum in your data set? 6. Should all of the data in your dataset have the same level of access? 7. Does your data include cultivar information? If no, or if the answer to question 4 was no skip to 9. 8. Do all of the data in your data set pertain to a single cultivar? 9. Is there a single date for all of the data in your data set? 10. Was the sample size greater than 1 for your data points? 11. Do you have notes to include with your data points? Based on the answers to these questions, we build the field list as follows: field_list = ['yield'] If 1 = no if 2 = yes field_list << 'citation_doi' else field_list << ['citation_author,citation_year,citation_title'] [add instructions about only requiring the beginning of the title] If 3 = no field_list << ['site'] If 4 = no field_list << ['species'] If 5 = no field_list << ['treatment'] If 6 = no field_list << ['access_level'] If 7 = yes and 8 = no field_list << ['cultivar'] If 9 = no field_list << ['date'] If 10 = yes field_list << ['n', 'SE'] If 11 = yes field_list << ['notes'] To Dos outline what instructions should be provided with each template (if any) outline the upload phase In particular: the steps of the upload wizard add citation, site, treatment, management, covariate, method as needed what instructions should be provided and where what validation will be done how interactively-provided data should be entered outline modifications needed for trait uploads fields in traits table that are not in yields table: variable_id date_year date_month date_day time timeloc time_hour time_minute entity_id For trait uploads, the variable (identified by name and units) must be provided in the CSV file or (if a single value applies to the whole data set) interactively during the upload process. Contact [David LeBauer](mailto:[email protected]) or [Mike Dietze](mailto:[email protected]) for more information about using this method of data upload.