Description

The goal of this document is to provide what information is required to make a PFR generated genome assembly fit for purpose to ensure that the downstream analysts are able to easily understand the source and meaning of the files that are available to them. The principal aim is enhanced collaboration, publication and error minimisation through this improved documentation and specification.

What does"fit for purpose"?

What is "fit for purpose" mean in this document? A downstream analyst, scientist or reviewer must be able to use the data that is presented in a reasonable timeframe and with an acceptable error model. Errors often creep in from misunderstandings which are greatly minimised if all parties know the expectations. 

What does Good Look Like

Roles and Responsibilities

Publication Levels

A release is the publication of a genome assembly for analytical use. Any publication that uses a genome assembly must refer to a specific release version. Releases may have different degrees of public access. These are defined as:

Naming Convention for a Release

Directories to store data in
Breeding plants: 
Bacteria: 
Fungi:
Vertebrate:
•          Full name: <species>_<population of origin>/<assembly_build>

Pre-Releases

Pre-releases follow this standard. The assembly build tag for a pre-release is 'pre'  and can contain a version number (e.g. Actinidia_chinensis_Russel_v2.1_pre).  A pre-release can contain a subset of a full release data file set.

Release Platforms

Genome Assembly Data

A release will comprise the following data:

Data Files

The data for an assembly will be stored in the following data file formats:
The minimum requirement for data files is: