Data Citation and Attribution

Well developed and uniform data citation standards are required to ensure linkages between publications and datasets are enduring and that creators of digital datasets receive appropriate credit when their data are used by others. Standards for data citation practices and implementation provide the mechanism by which digital datasets can be reliably discovered and retrieved. Closely related to data citation, other challenges include the ability to reliably identify, locate, access, interpret, and verify the version, integrity, and provenance of digital datasets.\cite{national2012For} Any data archiving policy must concern itself not only with how publications should appropriately cite the datasets used, but must also require attribution to authors of datasets outside the document.

Numerous organizations in the EU and US have studied this issue, and are continuing to refine technology solutions and best practices. For example, CODATA and the National Academy of Sciences released an in-depth international study and recommendations on citation of technical data.\cite{CODATA} Recently, these transnational initiatives have coalesced to produce a unified Joint Declaration of Data Citation Principles that is appropriate for any type of technical publication.\cite{FORCE11} The eight principles define the purpose, function, and attributes of data citations and address the need for citations to be both understood by humans and processed by machines. With a slightly different perspective focused more on the mechanics of linking published articles with data repositories, DataCite and the International Association of Scientific, Technical and Medical Publishers have issued a joint statement recommending best practices for citation of technical datasets in journals:\cite{joint}

  1. To improve the availability and findability of research data, encourage authors of research Papers to deposit researcher validated data in trustworthy and reliable Data Archives.

  2. Encourage Data Archives to enable bi-directional linking between Datasets and publications by using established and community endorsed unique persistent identifiers such as database accession codes and Digital Object Identifiers (DOIs). DOI was approved as ISO Standard 26324:2012 in May 2012

  3. Encourage publishers to make visible or increase visibility of these links from publications to datasets.

  4. Encourage Data Archives to make visible or increase visibility of these links from datasets to publications.

  5. Support the principle of data reuse and for this purpose actively participate in initiatives for best practice recommendations for the citation of datasets.

  6. Invite other organizations involved in research data management to join and support this statement.

An outstanding technical issue yet to be resolved concerns the granularity of the datasets used in a publication, both spatially and temporally. Spatial granularity refers to a subset of the dataset used in the research. Temporal granularity can refer to either the version of the dataset used, or the temporal state of the dataset used if the dataset itself is dynamic.