Problem: The Growing Problems of Outdated Communication

\label{sec:issues}

We are a long way from achieving this vision today. As noted above, the impediments exist primarily in two dimensions: we have to change the nature of the formats and technologies of communication, that underpins the world of scholarly publishing, and we have to change the social ecosystem of communication that has grown up around the existing technologies. We review the key issues in these two areas in turn.

Problems with Current Formats and Technologies

Existing Formats Are Not Tailored for Knowledge Transfer

\label{sec:issues-exp}

Scholarly communications are, at this mid-point in the digital revolution, in an ill-defined transitional state—a ’horseless carriage’ state—that lies somewhere between the world of print and paper and the world of the web and computers, with the former still exercising significantly more influence than the latter. However, the recent development of new media and communicative possibilities using information technology, and the need to communicate and comprehend increasing amounts of additional information such as numerical and multimedia data, make the traditional forms inadequate. Continued reliance on paper documents and their electronic shadows make it very difficult or impossible to incorporate massive amounts of data, moving images or software; there is simply no natural way to associate such ancillary information ’into’ the traditional publication. Additionally, any software-based text mining or information extraction procedures require that paper-based information first be converted into machine-tractable form and made freely available for such mining.

The Ever-Increasing Problem of Information Overload

\label{sec:issues-inf}

Scholars have experienced information overload for more than a century \cite{vickery1999} and the problem is just getting worse. Online access provides much better knowledge discovery and aggregation tools, but these tools struggle with the fragmentation of research communication caused by the rapid proliferation of increasingly specialized and overlapping journals, some with decreasing quality of reviewing \cite{schultza2011}.

Verifying Claims and Re-using Results

\label{sec:issues-data}

Most types of scholarship involve claims, and all sciences and many other fields require that these claims be independently testable. Good results are often re-used, sometimes thousands of times. But actually obtaining the necessary materials, data or software for such re-use is far harder than it should be. Even in the rare cases where the data are part of the research communication, these are typically relegated to the status of ’supplementary material’, whose format \cite{murrayrust2007} and preservation \cite{rosenthal2010} are inadequate. Sometimes the data are archived in separate data repositories that offer a more secure long-term future. But in such circumstances efforts need to be made to ensure that their links to the relevant textual research communications are explicit, robust and persistent. At present it is difficult for a scholar easily and sustainably to record the data on which the work is based in a form that others can absorb and use, and to maintain links to the associated textual publication.

Problems With Business and Assessment Models

Next-generation Tools Require Unfettered Resource Access

\label{sec:issues-acc}

Currently, a large and active movement of professionals and students, including data curators, are providing services intended to improve the effectiveness of scholarly communication, and thereby the productivity of researchers; these entail digging facts out of textual publications and presenting them in machine-readable actionable form. The need for much of this expensive manual effort would be reduced if authors were to provide the relevant metadata at the time of publication. These extraction processes are increasingly being performed by automated text mining and classification software. However, because the source material is usually copyrighted, and these rights are distributed across a large number of publishers, the service providers are forced to negotiate individual contracts with each publisher, which is extremely wasteful of time and resources. To reduce this burden, some research funders are increasingly mandating that research results of all types be made openly available. However, this results in a confusing world where some publications are immediately and freely available and others on the same topic are not.

A related problem is the effect of the web as the medium for scholarly communication, since it is ending the role of local library collections. Libraries and archives have been forced to switch from purchasing copies of the research communications of interest to their readers, to leasing web access to the publishers’ copies, with no assurance of long-term accessibility to current content if future subscriptions lapse. Bereft of almost all their original value to scholars, libraries are being encouraged to both compete in the electronic publishing market and to take on the task of running ’institutional repositories’, in effect publishing their scholars’ data and research communications. Though both tasks are important, neither has an attractive business model. Re-publishing an open access version of their scholars’ output where research is published in subscription-access journals may seem redundant, but it is essential if the artificial barriers that intellectual property restrictions have erected to data-mining and other forms of automated processing are to be overcome \cite{hargreaves2011}.

Traditional Publishing Models Are Under Attack

\label{sec:issues-bus}

Academic publishers have been slower to encounter, but are not immune from, the disruption that the internet has wrought on other content industries \cite{economist2009}. The academic publishers’ major customers, academic libraries, are facing massive budget cuts \cite{kniffel2009}, and so are unlikely to be a major source of continued revenue. The internet has greatly reduced the costs of publishing, new players (such Google and other software companies) have appeared in the market, and legislative and funding bodies are actively addressing issues of free access to data and text \cite{hargreaves2011}. The advent of the internet has greatly reduced the monetary value that can be extracted from paper-based academic content, and science publishers, who have traditionally depended on extracting this value, face a crisis, since their old business models are suffering disruption. Conversely, the internet permits the creation of new added-value services relating to search, semantics and integration that present exciting new commercial opportunities. Clearly the scholarly publishing industry needs to engage in discussions with different partners within the value chain, if it is to be included in the development of the new standards, services, business models, metrics/analysis, legislation, knowledge ecosystems and evaluation frameworks that the internet now makes possible, rather than being supplanted by new agile startups that have the ability to adapt more swiftly.

The software developers who build the current research informatics infrastructure are also very aware of the shortfalls and hindrances generated by today’s fragmented development efforts. The problems here can be attributed to a number of elements. First, heterogeneous technologies and designs, and the lack (or sometimes the superfluity!) of standards, cause unnecessary technical difficulties and directly affect integration costs. Second, a complex landscape of intellectual property rights and licensing for software add legal concerns to developers’ requirements. Third, research software developers typically work in a competitive environment, either academic or commercial, where innovation is rewarded much more highly than evolutionary and collaborative software reuse. This is especially true in a funding environment driven by the need for intensive innovation, where reusing other peoples’ code is a likely source of criticism. Finally, even under optimal technical conditions, it is still challenging for software programmers to understand what components are the most appropriate for a given challenge, to make contact with the correct people to facilitate the construction of tools, and to work within distributed teams across groups to build high-quality interoperable software. The impact of these tools is, far too often, solely based on how immediately useful they will be to researchers themselves, with no thought for the wider community.

Thus changing roles and business models form an immense challenge for libraries, publishers and software developers. The only fruitful way forward, we firmly believe, will be for all parties collaborating to build new tools that optimally support scholarship in a distributed open environment. Only by creating a demonstrably better research environment will we convince the entire system of scholarly communication and merit assessment to adopt new forms and models.

Current Assessment Models Don’t Measure Merit

\label{sec:issues-ass}

Not only are the products of research activity still firmly rooted in the past, so too are our means of assessing the impact of those products and of the scholars who produce them. For five decades, the impact of a scholarly work—an entity that is already narrowly defined, in the sciences as a journal article, and in the humanities as a monograph—has been judged by counting the number of citations it receives from other scholarly works, or, worse, by attributing worth to an individual’s work based solely on the overall impact factor of the journal in which it happens to be published. We now live in an age in which other methods of evaluation, including article-level usage metrics, blog comments, discussion on mail lists, press quotes, and other forms of media, are becoming increasingly important reflections of scholarly and public impact. Failure to take these aspects into account means not only that the impact and/or quality of a publication is not adequately measured, but also that the current incentivization and evaluation system for scholars does not relate well to the actual impact of their activities.