Authorea

Alberto Pepe renamed results_table.tex to Issues.tex almost 11 years ago

Commit id: 872c9a00f2c0fd1318793bca23132f11b828efcd

deletions | additions

\section{Problem: The Growing Problems of Outdated Communication}\label{sec:issues} We are a long way from achieving this vision today. As noted above, the impediments exist primarily in two dimensions: we have to change the \textit{nature of the formats and technologies of communication}, that underpins the world of scholarly publishing, and we have to change the \textit{social ecosystem of communication} that has grown up around the existing technologies. We review the key issues in these two areas in turn. \subsection*{Problems with Current Formats and Technologies} \subsection{Existing Formats Are Not Tailored for Knowledge Transfer}\label{sec:issues-exp} Scholarly communications are, at this mid-point in the digital revolution, in an ill-defined transitional state---a 'horseless carriage' state---that lies somewhere between the world of print and paper and the world of the web and computers, with the former still exercising significantly more influence than the latter. However, the recent development of new media and communicative possibilities using information technology, and the need to communicate and comprehend increasing amounts of additional information such as numerical and multimedia data, make the traditional forms inadequate. Continued reliance on paper documents and their electronic shadows make it very difficult or impossible to incorporate massive amounts of data, moving images or software; there is simply no natural way to associate such ancillary information 'into' the traditional publication. Additionally, any software-based text mining or information extraction procedures require that paper-based information first be converted into machine-tractable form and made freely available for such mining. \subsection{The Ever-Increasing Problem of Information Overload}\label{sec:issues-inf} Scholars have experienced information overload for more than a century \cite{vickery1999} % \footnote{\url{http://eloquentscience.com/2011/06/the-proliferation-of-scientific-literature/}} and the problem is just getting worse. Online access provides much better knowledge discovery and aggregation tools, but these tools struggle with the fragmentation of research communication caused by the rapid proliferation of increasingly specialized and overlapping journals, some with decreasing quality of reviewing \cite{schultza2011}. % \footnote{\url{http://eloquentscience.com/2011/04/the-increasing-number-of-open-access-publishers-a-good-thing/}} \subsection{Verifying Claims and Re-using Results}\label{sec:issues-data} Most types of scholarship involve claims, and all sciences and many other fields require that these claims be independently testable. Good results are often re-used, sometimes thousands of times. But actually obtaining the necessary materials, data or software for such re-use is far harder than it should be. Even in the rare cases where the data are part of the research communication, these are typically relegated to the status of 'supplementary material', whose format \cite{murrayrust2007} % \footnote{\url{http://www.sis.pitt.edu/\ repwkshop/papers/murray.html}} and preservation \cite{rosenthal2010} % \footnote{\url{http://dx.doi.org/10.3789/isqv22n3.2010.04}} are inadequate. Sometimes the data are archived in separate data repositories that offer a more secure long-term future. But in such circumstances efforts need to be made to ensure that their links to the relevant textual research communications are explicit, robust and persistent. % Confusing use of nomenclatures and lack of persistent accession % IDs also confound attempts to capture or integrate data. Data % submissions are only patchily starting to be citable via % Much of the discussion of enhanced research communication turns on the % availability of digital assets, mostly data, but with an increasing % emphasis on software and workflows as well, and the exploitation of % these assets to provide a rich media experience, enhanced % functionality and discoverability or other benefits of online % interactions. Less explored are the issues of how the data was % collected, what the relevant physical artifacts are, and how best to % capture the information on this in a useful way. As is also the case % for effective data and digital process publication, this requires % systems that help the user to think about publication earlier than is % traditionally the case; but there are unique challenges to capturing the record of % physical processes and, in particular, the physical world provenance % trail that leads to the first relevant digital artifact. % Scholars require effective data recording systems that enhance % communication---as opposed to just record-keeping---need to be built % and configured in a way that makes those recording processes easy, % automatically capturing records of physical and digital artifacts via % data models that can deliver immediate benefits to the user, but also % rendering the ultimate aggregation and collation of records into a useful % form for communication easy as well. At present it is difficult for a scholar easily and sustainably to record the data on which the work is based in a form that others can absorb and use, and to maintain links to the associated textual publication. % ----This par belongs elsewhere---- % DOIs.\footnote{\url{http://sites.nationalacademies.org/PGA/brdi/PGA\_064019}}. % A major challenge to the adoption of new systems and tools is that % they tend to disrupt existing patterns of work and flows through % them. Researchers are very conservative in their use of new tools, % particularly tools and systems that act at key points in the research % process. A good example of such a key point is the connection between the % experimental record and its communication through some form of % publication. \subsection*{Problems With Business and Assessment Models} \subsection{Next-generation Tools Require Unfettered Resource Access}\label{sec:issues-acc} Currently, a large and active movement of professionals and students, including data curators, are providing services intended to improve the effectiveness of scholarly communication, and thereby the productivity of researchers; these entail digging facts out of textual publications and presenting them in machine-readable actionable form. The need for much of this expensive manual effort would be reduced if authors were to provide the relevant metadata at the time of publication. % This would enable publications to be automatically identified for % inclusion in a specific data repository immediately after being released % by the author. These extraction processes are increasingly being performed by automated text mining and classification software. However, because the source material is usually copyrighted, and these rights are distributed across a large number of publishers, the service providers are forced to negotiate individual contracts with each publisher, which is extremely wasteful of time and resources. To reduce this burden, some research funders are increasingly mandating that research results of all types be made openly available. However, this results in a confusing world where some publications are immediately and freely available and others on the same topic are not. A related problem is the effect of the web as the medium for scholarly communication, since it is ending the role of local library collections. % In many countries, such as the US, libraries (sometimes in consortia) % retain their role as the paying customers of the publishers. In other % countries, such as the UK, negotiations as to the terms of access and % payment for it are now undertaken at a national % level.\footnote{\url{http://www.jisc-collections.ac.uk/}} % Neither provides librarians much ability to be discriminating customers % of individual journals. Libraries and archives have been forced to switch from purchasing copies of the research communications of interest to their readers, to leasing web access to the publishers' copies, with no assurance of long-term accessibility to current content if future subscriptions lapse. Bereft of almost all their original value to scholars, libraries are being encouraged to both compete in the electronic publishing market %\cite{hahn2008} FIND OR DELETE THIS REFERENCE and to take on the task of running 'institutional repositories', in effect publishing their scholars' data and research communications. %\cite{luce2008} FIND OR DELETE THIS REFERENCE Though both tasks are important, neither has an attractive business model. Re-publishing an open access version of their scholars' output where research is published in subscription-access journals may seem redundant, but it is essential if the artificial barriers that intellectual property restrictions have erected to data-mining and other forms of automated processing are to be overcome \cite{hargreaves2011}. % Equally, because the mechanism that enforces compliance with the % current system of research communication attaches value only to % publications in traditional formats, vast human and machine efforts % are required to extract the factual content of the communication. Were % researchers to publish their content in formats better adapted to % information technology, these costs could be avoided. \subsection{Traditional Publishing Models Are Under Attack}\label{sec:issues-bus} Academic publishers have been slower to encounter, but are not immune from, the disruption that the internet has wrought on other content industries\cite{economist2009}. %FIND OR DELETE THIS REFERENCE The academic publishers' major customers, academic libraries, are facing massive budget cuts \cite{kniffel2009}, and so are unlikely to be a major source of continued revenue. The internet has greatly reduced the costs of publishing, new players (such Google and other software companies) have appeared in the market, and legislative and funding bodies are actively addressing issues of free access to data and text \cite{hargreaves2011}. The advent of the internet has greatly reduced the monetary value that can be extracted from paper-based academic content, and science publishers, who have traditionally depended on extracting this value, face a crisis, since their old business models are suffering disruption. Conversely, the internet permits the creation of new added-value services relating to search, semantics and integration that present exciting new commercial opportunities. Clearly the scholarly publishing industry needs to engage in discussions with different partners within the value chain, if it is to be included in the development of the new standards, services, business models, metrics/analysis, legislation, knowledge ecosystems and evaluation frameworks that the internet now makes possible, rather than being supplanted by new agile startups that have the ability to adapt more swiftly. The software developers who build the current research informatics infrastructure are also very aware of the shortfalls and hindrances generated by today's fragmented development efforts. The problems here can be attributed to a number of elements. First, heterogeneous technologies and designs, and the lack (or sometimes the superfluity!) of standards, cause unnecessary technical difficulties and directly affect integration costs. Second, a complex landscape of intellectual property rights and licensing for software add legal concerns to developers' requirements. Third, research software developers typically work in a competitive environment, either academic or commercial, where innovation is rewarded much more highly than evolutionary and collaborative software reuse. This is especially true in a funding environment driven by the need for intensive innovation, where reusing other peoples' code is a likely source of criticism. Finally, even under optimal technical conditions, it is still challenging for software programmers to understand what components are the most appropriate for a given challenge, to make contact with the correct people to facilitate the construction of tools, and to work within distributed teams across groups to build high-quality interoperable software. The impact of these tools is, far too often, solely based on how immediately useful they will be to researchers themselves, with no thought for the wider community. Thus changing roles and business models form an immense challenge for libraries, publishers and software developers. The only fruitful way forward, we firmly believe, will be for all parties collaborating to build new tools that optimally support scholarship in a distributed open environment. Only by creating a demonstrably better research environment will we convince the entire system of scholarly communication and merit assessment to adopt new forms and models. \subsection{Current Assessment Models Don't Measure Merit}\label{sec:issues-ass} Not only are the products of research activity still firmly rooted in the past, so too are our means of assessing the impact of those products and of the scholars who produce them. For five decades, the impact of a scholarly work---an entity that is already narrowly defined, in the sciences as a journal article, and in the humanities as a monograph---has been judged by counting the number of citations it receives from other scholarly works, or, worse, by attributing worth to an individual's work based solely on the overall impact factor of the journal in which it happens to be published. We now live in an age in which other methods of evaluation, including article-level usage metrics, blog comments, discussion on mail lists, press quotes, and other forms of media, are becoming increasingly important reflections of scholarly and public impact. Failure to take these aspects into account means not only that the impact and/or quality of a publication is not adequately measured, but also that the current incentivization and evaluation system for scholars does not relate well to the actual impact of their activities.