Bootstrapping the Basics

LaTeX

LaTeX is an authoring format for documents, built on top of Donald Knuth’s typesetting system TeX. It has originally been designed for creating beautifully typeset manuscripts for print, in the ”pixels on paper” paradigms of PDF and DVI. Its killer feature? You can literally “program” your text – both its form and function.

Being able to “program” a text is not immediately a good usability feature. In fact, if you are writing in text-heavy genres, such as fiction, or mainly author short forms, such as emails, blogs and Facebook posts, you quite likely don’t need anything beyond the most basic word processor.11If you are mostly using Twitter, even a basic tablet on-screen keyboard feels like a waste. In the similarly vehement debates over the value of programming languages, peace can sometimes be kept via a simple rule of thumb:

Use the Best tool for the job.

LaTeX can be a “best tool” and has been a very loved partner in writing scientific publications for the Formal and Natural sciences. It has been particularly dominant in math-heavy fields, as beautiful mathematical formulas were a TeX-only prerogative until the early 2000s. Many would argue that the competing Office solutions have been playing catch-up for a long time also for citations and advanced typographic styling. If you were writing a book or technical manual that was well into the hundreds of pages, LaTeX could robustly process and typeset your document from its early inception.

However, great power brings great opportunity for blunders. The cost of LaTeX is a tedious syntax (by modern standards) and a large range of possible errors, caused by the hacked-together module system of classes and styles, and the absence of reliable encapsulation or inheritance. As programming languages go, TeX is one of the hardest to read or write. Unsurprisingly, that makes alternatives quite appealing, especially for uses that do not require a typesetting bazooka.

MarkDown

MarkDown is among the most used authoring languages for web documents, powering Wikipedia and countless specialized wiki-based sites. It is common to use MarkDown as a contrast to LaTeX in order to demonstrate the difference in typesetting paradigms, but also the “Convention over Configuration” approach applied to authoring tools.

Looking back on the inception of projects, we see that MarkDown has targeted web pages (HTML), while LaTeX and Office have targeted printers (DVI and later PDF). But the differences do not stop there, as MarkDown is also much more limited in scope and expressivity, and if you ask too much of it, it pecks back.

It is also common to use one part of the technology stack to refer to the entire toolchain22An example of a technological use of metonymy. Namely, MarkDown usually implies creating an HTML document, while LaTeX implies a PDF document. It is important to keep in mind that while they are traditionally used together, these are separate and very different formats that can be re-appropriated. It is possible to both create PDF documents from MarkDown, as well as HTML documents from LaTeX, but those are the current exceptions, rather than the rule.

I will only go into the typesetting aspects in this post, as I plan to write a separate entry for my thoughts on authorship UX. For now it is enough to note that MarkDown is incredibly simple to learn and use, as it has a minimal set of typesetting commands (e.g. sectioning, math, tables). LaTeX on the other hand is a full-blown programming language with a vast ecosystem of extra features and extensions, which allows you to customize anything.

Finding the sweet spot that strikes the perfect balance between power and convention is also one of our goals at Authorea, where we strive to offer a fluent editing experience to researchers.

Web-first Publications

Roll the tape forward to 2015. Books are still printed and perused, but a great amount of our writing has moved to a web-first form. The language of the web in 2015 is HTML5, and a ”web-first scientific document” is an HTML5 document.

While most established publishers still work on a print-first basis, the majority of publications are now available online, at a minimum as legacy PDF documents. From what I can tell, the movement to web-first publishing is growing ever stronger, with the advent of e-Readers and active documents. The promise of active documents\cite{KohDavGin:psewads11}, which can embed not only hyperlinks, but also data, is quite appealing. On-demand machine support can offer a wide range of services, from screen-readers and machine translation to interactively exposing the data behind figures and tables.

The web-first scientific manuscripts of 2015 are HTML5 documents. LaTeX is one of several viable, yet imperfect, authoring languages for the web.

And if you are keeping up, you would correctly notice this implies a significant paradigm shift.