LaTeX is Dead (long live LaTeX) Typesetting in the Digital Age
Welcome to the Pitchfork Party
This post comes in the context of a series of healthy discussion pieces on authoring scientific content for the web:1
A Scholarly MarkDown discussion on Hacker News (see the comments)
In this text, I will try to elaborate on the merits and deficiencies of using a pre-web authoring syntax, LaTeX, for writing modern publications in 2015 as active web documents. My stance is evolutionary – we should adapt our existing tools to the new environment and in the process gain insights for what the next generation of tools ought to be.
If you are a working scientist who authors in LaTeX, I will suggest how to gradually adapt your existing toolchain, while making your first steps towards the future of publishing. If you don’t find the technical details interesting, you can skip to my suggestion in Section [sec:conclusion].
If you are a developer, I will argue with you that the next generation has not fully arrived yet.
We’re not going to start a fire
Feelings can burn strong when the words “LaTeX” and “Web” appear together.
Debates over tool superiority, especially when online, tend to quickly become heated and destructive. My best guess is that the personal experiences with our tools over time evolve into full-blown relationships, with all associated pros and cons of that status. Maybe you truly love your tool, and that is great, please go ahead and nourish that feeling. Meanwhile, I will step back into more abstract territory and try to poke some applications with a stick and see when they bite. You’re welcome to tag along, but there’s no need for extra venom.
What's Open Access Good For? Absolutely everything!
and 5 collaborators
8,249 more reasons to use Authorea
and 3 collaborators
Bringing the power of LaTeX and Git to all researchers
and 2 collaborators
Live Mathematics on Authorea A Case for Transparency in Science
and 1 collaborator
Authorea is a collaborative platform for writing in research and education, with a focus on web-first, high quality scientific documents.
We offer a tour through our integration of technologies that evolve math-rich papers into transparent, active objects. To enumerate, we currently employ Pandoc and LaTeXML (for authoring), MathJax (for math rendering and clipboard), D3.js (data visualization), iPython (computation), Flotchart and Bokeh (interactive plots).
This paper presents the challenges and rewards of integrating active web components for mathematics, while preserving backwards-compatibility with classic publishing formats. We conclude with an outlook to the next-to-come mathematics enhancements on Authorea, and a technology wishlist for the coming year.
-1 has Clear Semantics? Hold my Beer.
This is a story of semantics-by-convention gone wrong that hit us at Authorea last week.
Tokenizing an arXiv.org article with LLaMaPUn
Welcome to LLaMaPUn!
The Cornell preprint contains roughly a million scientific papers, making it a treasure trove for natural language processing (NLP) experiments.
However, a big difference from mainstream NLP corpora is the presence of mathematical formulas, citations and other language modalities specific to scientific discourse. A second, and in practice just as significant challenge is that the majority of documents are authored in LaTeX, making them very irregular for naive automated mining.
At the research group at Jacobs University we have invested a lot of work in trying to regularize the dataset and make it available for NLP research, which is a large topic in its own right. I wrote an entry-level blog post about that effort here.
In this blog post, I want to briefly introduce the newest incarnation of the NLP library for scientific documents, backed up by a running example of word tokenization on an average preprint from the dataset.
Joys of Pi: A test server and monitor host for the startup developer
and 2 collaborators
Measuring Open Science
and 4 collaborators
“Open science commonly refers to efforts to make the output of publicly funded research more widely accessible in digital format to the scientific community, the business sector, or society more generally” writes the Organisation for Economic Cooperation and Development (OECD) in its newly released study “Making Open Science a Reality”.
In the digital age the role of tools like Authorea is to increase the efficiency of research as well of its diffusion. The benefits of open science identified by the OECD are multiple:
Reducing duplication costs in collecting, creating, transferring and reusing data and scientific material; allowing more research from the same data; and multiplying opportunities for domestic and global participation in the research process.
The greater scrutiny offered by open science allows a more accurate verification of research results.
Increased access to research results (in the forms of both publications and data) can foster spillovers not only to scientific systems but also innovation systems more broadly. (Firms and individuals may use and reuse scientific outputs to produce new products and services.)
Open science also allows the closer involvement and participation of citizens.