PREPRINT authorea.com/19359
Main Data History
Export
Show Index Toggle 3 comments
  •  Quick Edit
  • LaTeX is Dead (long live LaTeX)
    Typesetting in the Digital Age

    Welcome to the Pitchfork Party

    This post comes in the context of a series of healthy discussion pieces on authoring scientific content for the web:11The author has his own torch in hand: I am a core contributor to LaTeXML and an enthusiastic developer at Authorea. So keep that bias in mind while reading on.

    In this text, I will try to elaborate on the merits and deficiencies of using a pre-web authoring syntax, LaTeX, for writing modern publications in 2015 as active web documents. My stance is evolutionary – we should adapt our existing tools to the new environment and in the process gain insights for what the next generation of tools ought to be.

    If you are a working scientist who authors in LaTeX, I will suggest how to gradually adapt your existing toolchain, while making your first steps towards the future of publishing. If you don’t find the technical details interesting, you can skip to my suggestion in Section \ref{sec:conclusion}.

    If you are a developer, I will argue with you that the next generation has not fully arrived yet.

    We’re not going to start a fire

    Feelings can burn strong when the words “LaTeX” and “Web” appear together.

    Debates over tool superiority, especially when online, tend to quickly become heated and destructive. My best guess is that the personal experiences with our tools over time evolve into full-blown relationships, with all associated pros and cons of that status. Maybe you truly love your tool, and that is great, please go ahead and nourish that feeling. Meanwhile, I will step back into more abstract territory and try to poke some applications with a stick and see when they bite. You’re welcome to tag along, but there’s no need for extra venom.

    Bootstrapping the Basics

    LaTeX

    LaTeX is an authoring format for documents, built on top of Donald Knuth’s typesetting system TeX. It has originally been designed for creating beautifully typeset manuscripts for print, in the ”pixels on paper” paradigms of PDF and DVI. Its killer feature? You can literally “program” your text – both its form and function.

    Being able to “program” a text is not immediately a good usability feature. In fact, if you are writing in text-heavy genres, such as fiction, or mainly author short forms, such as emails, blogs and Facebook posts, you quite likely don’t need anything beyond the most basic word processor.11If you are mostly using Twitter, even a basic tablet on-screen keyboard feels like a waste. In the similarly vehement debates over the value of programming languages, peace can sometimes be kept via a simple rule of thumb:

    Use the Best tool for the job.

    LaTeX can be a “best tool” and has been a very loved partner in writing scientific publications for the Formal and Natural sciences. It has been particularly dominant in math-heavy fields, as beautiful mathematical formulas were a TeX-only prerogative until the early 2000s. Many would argue that the competing Office solutions have been playing catch-up for a long time also for citations and advanced typographic styling. If you were writing a book or technical manual that was well into the hundreds of pages, LaTeX could robustly process and typeset your document from its early inception.

    However, great power brings great opportunity for blunders. The cost of LaTeX is a tedious syntax (by modern standards) and a large range of possible errors, caused by the hacked-together module system of classes and styles, and the absence of reliable encapsulation or inheritance. As programming languages go, TeX is one of the hardest to read or write. Unsurprisingly, that makes alternatives quite appealing, especially for uses that do not require a typesetting bazooka.

    MarkDown

    MarkDown is among the most used authoring languages for web documents, powering Wikipedia and countless specialized wiki-based sites. It is common to use MarkDown as a contrast to LaTeX in order to demonstrate the difference in typesetting paradigms, but also the “Convention over Configuration” approach applied to authoring tools.

    Looking back on the inception of projects, we see that MarkDown has targeted web pages (HTML), while LaTeX and Office have targeted printers (DVI and later PDF). But the differences do not stop there, as MarkDown is also much more limited in scope and expressivity, and if you ask too much of it, it pecks back.

    It is also common to use one part of the technology stack to refer to the entire toolchain22An example of a technological use of metonymy. Namely, MarkDown usually implies creating an HTML document, while LaTeX implies a PDF document. It is important to keep in mind that while they are traditionally used together, these are separate and very different formats that can be re-appropriated. It is possible to both create PDF documents from MarkDown, as well as HTML documents from LaTeX, but those are the current exceptions, rather than the rule.

    I will only go into the typesetting aspects in this post, as I plan to write a separate entry for my thoughts on authorship UX. For now it is enough to note that MarkDown is incredibly simple to learn and use, as it has a minimal set of typesetting commands (e.g. sectioning, math, tables). LaTeX on the other hand is a full-blown programming language with a vast ecosystem of extra features and extensions, which allows you to customize anything.

    Finding the sweet spot that strikes the perfect balance between power and convention is also one of our goals at Authorea, where we strive to offer a fluent editing experience to researchers.

    Web-first Publications

    Roll the tape forward to 2015. Books are still printed and perused, but a great amount of our writing has moved to a web-first form. The language of the web in 2015 is HTML5, and a ”web-first scientific document” is an HTML5 document.

    While most established publishers still work on a print-first basis, the majority of publications are now available online, at a minimum as legacy PDF documents. From what I can tell, the movement to web-first publishing is growing ever stronger, with the advent of e-Readers and active documents. The promise of active documents(Kohlhase 2011), which can embed not only hyperlinks, but also data, is quite appealing. On-demand machine support can offer a wide range of services, from screen-readers and machine translation to interactively exposing the data behind figures and tables.

    The web-first scientific manuscripts of 2015 are HTML5 documents. LaTeX is one of several viable, yet imperfect, authoring languages for the web.

    And if you are keeping up, you would correctly notice this implies a significant paradigm shift.