With few exceptions, scholars would be better off writing their papers in a lightweight markup format called Markdown, rather than using a word-processing program like Microsoft Word. This post explains why, and reveals a hidden agenda as well.1
Before turning to lightweight markup, I review the problems with Microsoft Word as the lingua franca for producing scholarly articles. This ground has been heavily covered. (Here’s a recent example.) The problems include:
The solution is to use a markup format that allows specification of the document (providing its logical structure) separate from the typesetting of that document. Your document is specified – that is, generated and stored – as straight text. Any formatting issues are handled not by changing the formatting directly via a graphical user interface but by specifying the formatting textually using a specific textual notation. For instance, in the HTML markup language, a word or phrase that should be emphasized is textually indicated by surrounding it with <em>…</em>. HTML and other powerful markup formats like LaTeX and various XML formats carry relatively large overheads. They are complex to learn and difficult to read. (Typing raw XML is nobody’s idea of fun.) Ideally, we would want a markup format to be lightweight, that is, simple, portable, and human-readable even in its raw state.
Markdown is just such a lightweight markup language. In Markdown, emphasis is textually indicated by surrounding the phrase with asterisks, as is familiar from email conventions, for example, *lightweight*. See, that wasn’t so hard. Here’s another example: A bulleted list is indicated by prepending each item on a separate line with an asterisk, like this:
* First item
* Second item
which specifies the list
Because specification and typesetting are separated, software is needed to convert from one to the other, to typeset the specified document. For reasons that will become clear later, I recommend the open-source software pandoc. Generally, scholars will want to convert their documents to PDF (though pandoc can convert to a huge variety of other formats). To convert file.md (the Markdown-format specification file) to PDF, the command
pandoc file.md -o file.pdf
suffices. Alternatively, there are many editing programs that allow entering, editing, and typesetting Markdown. I sometimes use Byword. In fact, I’m using it now.
Markup languages range from the simple to the complex. I argue for Markdown for four reasons:
Above, I claimed that scholars should use Markdown “with few exceptions”. The exceptions are:
Some have proposed that we need a special lightweight markup language for scholars. But Markdown is sufficiently close, and has such a strong community of support and software infrastructure, that it is more than sufficient for the time being. Further development would of course be helpful, so long as the urge to add “features” doesn’t overwhelm its core simplicity.
I have a hidden agenda. Markdown is sufficient for the bulk of cases of composing scholarly articles, and simple enough to learn that academics might actually use it. Markdown documents are also typesettable according to a separate specification of document style, and retargetable to multiple output formats (PDF, HTML, etc.).2 Thus, Markdown could be used as the production file format for scholarly journals, which would eliminate the need for converting between the authors’ manuscript version and the publishers internal format, with all the concomitant errors that process is prone to produce.
In computer science, we have by now moved almost completely to a system in which authors provide articles in LaTeX so that no retyping or recomposition of the articles needs to be done for the publisher’s typesetting system. Publishers just apply their LaTeX style files to our articles. The result has been a dramatic improvement in correctness and efficiency. (It is in part due to such an efficient production process that the cost of running a high-end computer science journal can be so astoundingly low.)
Even better, there is a new breed of collaborative web-based document editing tools being developed that use Markdown as their core file format, tools like Draft and Authorea. They provide multi-author editing, versioning, version comparison, and merging. These tools could constitute the system by which scholarly articles are written, collaborated on, revised, copyedited, and moved to the journal production process, generating efficiencies for a huge range of journals, efficiencies that we’ve enjoyed in computer science and mathematics for years.
As Rob Walsh of ScholasticaHQ says, “One of the biggest bottlenecks in Open Access publishing is typesetting. It shouldn’t be.” A production ecology built around Markdown could be the solution.
pandoc -S --mathjax --base-header-level=3 markdownpost.md -o markdownpost.html
A nicely typeset version using the American Mathematical Society’s journal article document style can be generated with
pandoc markdownpost.md -V documentclass:amsart -o markdownpost-amsart.pdf
To target the style of ACM transactions instead, the following command suffices:
pandoc markdownpost.md -V documentclass:acmsmall -o markdownpost-acmsmall.pdf
Both PDF versions are also attached to this post.↩
Attachments |
|
Comments