Our Vision

\label{sec:vision}

A dispassionate observer, perhaps visiting from another planet, would surely be dumbfounded by how, in an age of multimedia, smartphones, 3D television and 24/7 social network connectivity, scholars and researchers continue to communicate their thoughts and research results primarily by means of the selective distribution of ink on paper, or at best via electronic facsimiles of the same.

Modern technologies enable vastly improved knowledge transfer and far wider impact. Freed from the restrictions of paper, numerous advantages appear. Communication becomes instantaneous across geographic boundaries. Terms in electronic documents may be automatically disambiguated and semantically defined by linking to standard terminology repositories, allowing more accurate retrieval in searches; complex entities mentioned in documents may be automatically expanded to show diagrams or pictures that facilitate understanding; citations to other documents may be enhanced by summaries generated automatically from the cited documents. Documents may be automatically clustered with others that are similar, showing their relationship to others within their scholarly context, and their place in the ongoing evolution of ideas. Ancillary material that augments the text of the scholarly work may be linked to or distributed with the work; this may include numerical data (from experiments), images and videos (showing procedures or scenarios), sound recordings, presentational materials, and other elements in forms of media still on the horizon. Extracts and discussions of scholarly work on social media such as blogs, online discussion groups and Twitter may greatly broaden the visibility of a work and enable it to be better evaluated and cross-linked to other information sources. A broad range of recent technological advances provide increasingly diverse and powerful opportunities for more effective scholarly communication; we need to grasp the opportunities and make these possibilities realities.

We see a future in which scientific information and scholarly communication more generally become part of a global, universal and explicit network of knowledge; where every claim, hypothesis, argument—every significant element of the discourse—can be explicitly represented, along with supporting data, software, workflows, multimedia, external commentary, and information about provenance. In this world of networked knowledge objects, it would be clear how the entities and discourse components are related to each other, including relationships to previous scholarship; learning about a new topic means absorbing networks of information, not individually reading thousands of documents. Adding new elements of scholarly knowledge is achieved by adding nodes and relationships to this network. People could contribute to the network from a variety of perspectives and with different degrees of weightiness; each contribution would be immediately accessible globally by others. Reviewing procedures, as well as reputation management mechanisms, would provide ways to evaluate and filter information. This vision moves away from the Gutenberg paper-centric model of the scholarly literature, towards a more distributed network-centric model; it is a model far better suited for making knowledge-level claims and supporting digital services, including more effective tracking and interrogation of what is known, not known, or contested.

To enable this vision, we need to create and use new forms of scholarly publication that work with reusable scholarly artifacts. Two principal aspects can be distinguished. First, we need to revise the artifacts of communication. As a starting point, our vision entails creating a new, enriched form of scholarly publication that enables the creation and management of relationships between knowledge, claims and data. It also means the creation of a knowledge infrastructure that allows the sharing of computationally executable components, such as workflows, computer code and statistical calculations, as scientifically valid content components; and an infrastructure that allows these components to be made accessible, reviewed, referenced and attributed. To do this, we have to develop best practices for depositing research datasets in repositories that enable linking to relevant documents, and that have high compliance levels driven by appropriate incentives, resources and policies. In addition, for scientific domains, the new forms of publication must facilitate reproducibility of results, which means, at least for in silico research, the ability to preserve and re-perform executable workflows or services. This will require the ability to re-construct the context in which these objects were executed, which may well contain or reference other executable objects as well as data objects that may evolve through time. In this way, the content of communications about research will follow the same evolutionary path that we have seen for general web content: a move from the static to the increasingly dynamic.

With all this, we do recognise the importance of the peer-reviewed journal article as a primary dissemination channel and public record of new research results, since it uniquely provides a dated version of record of the authors’ views at the time of publication, and as such becomes an immutable part of the scientific record. But even here, with this the most traditional of scholarly communication media, we can with existing technologies provide immediated improvements: semantic enhancements to the text; greater interactivity with tables and figures; access to the data within articles in actionable form; data fusions (mashups) with data from other sources, for example Google Maps, where appropriate; direct citation of and links to underlying datasets stored in databases and data repositories; and the open publication in machine-readable form of both the full bibliographic record for the article and also the citation information contained within the article’s reference list, encoded using appropriate ontologies, so that these basic facts can enter the web of linked open data http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData \cite{shotton2011}.

The second component of our vision requires changes to the complex socio-technical scholarly and commercial ecosystem. In particular, to obtain the benefits that networked knowledge promises, we have to put in place reward systems that encourage scholars and researchers to participate and contribute. We need to acknowledge the fact that notions such as journal impact factor are poor surrogates for measuring the true impact of scholarship, and are increasingly irrelevant in a world of disaggregated knowledge units of vastly varying granularity; and we need to derive new mechanisms that allow us more accurately to measure true contributions to the ongoing enterprise of augmenting the world’s store of knowledge. The business models that are currently driving scholarly publishing, which rest mainly on libraries buying access rights to digital journals from publishers, are clearly no longer adequate to support the rich, variegated, integrated and disparate knowledge offerings that new technologies enable, and that new scholarship requires. In a collaboration involving scholars, publishers, libraries, funding agencies, and academic institutions, we need to develop models that can enable this exciting future to develop, while offering sustainable forms of existence for the constituent parties, although perhaps not in their present states.

If we get this right, the potential is immense. The changes we envisage pave the way for a revolution in the manner in which research is carried out and communicated, leading to significant improvements in scholarly productivity and quality, and enhanced transparency that can only increase the public’s trust in the value of science. Similar benefits apply to scholarship in the arts and humanities.

These developments bring advantages for many parties:

  • For scholars, the benefits are better communication of knowledge: easier transmission of information from its creators or discovers (the producers), in more forms using richer media, permitting easier, faster and deeper interpretation of the information by the consumers (other scholars, students and their teachers, government and non-governmental agencies, industry, the media, and society at large). At the same time, these new and enhanced forms of communication will enable more accurate evaluations of the quality and the impact of scholars’ work, facilitating better promotion evaluations and proposal assessments.

  • Similarly, for decision makers and managers, the new communicative forms mean that the impacts and effects of scholarly communications, and hence of their authors, can more easily be tracked and evaluated.

  • For research funders, enhanced communications will enable more accurate overviews of the size, direction and importance of each stream of research, and permit quicker determination of the quality of the work cited in grant proposals. But these advances mean that established practice will need to change.

  • For librarians and archivists, while online accessibility will mean that traditional library holdings become less important, the archiving, updating and maintenance of digital data and software will increase in importance. Adapting to these changes will bring about new modes of service to users.

  • Similarly, for publishers, the traditional functions of manuscript compilation and distribution will change radically, while quality control, access facilitation, new modes of aggregation, and the standardization, maintenance, and support of knowledge access technologies become more important. Providing these services will allow publishers successfully to face the challenges of free access to published research that is being ushered in by the open access movement.