On content and style: the beauty of markdown
I learned HTML in the 1990s, when the web was young. As I pieced together my first simple web pages, I remember reading about an interesting new language that was gaining popularity fast – Cascading Style Sheets (CSS). For a web design beginner (in an era where web design was barely a thing), I remember the slight confusion I had over deciding whether to use the traditional <font>
HTML tags or the newfangled CSS style
attribute with <div>
and <span>
, which felt slightly less straightforward. Then there was <table>
versus new floating <div>
for layout, which raised a new set of issues, and so on. My early web pages were often a mingling of the two, whichever I could get to work quickest. I thought CSS was nice, but the fundamental concept of what makes CSS useful didn’t occur to me at the time. I find it interesting to reflect on the popularity of CSS today – it’s still here, working in much the same way as it was back in the 1990s. What made (and makes) CSS so popular and enduring? The answer, I think, is that Cascading Style Sheets are a quintessential implementation of a fundamental principle of design and productivity: dividing content from style.
What do I mean by fundamental? That dividing content from style turns out to be relevant not only to web design, but to many other tasks, including many of my everyday tasks as a scientist, like writing papers and notes, managing citations, and building presentations. Yet many (probably most) scientists have not yet realized how this concept can improve their productivity (and, more importantly, the collective productivity of the community). So, this essay is my attempt to explain the principle and convince you to apply it to more of your everyday tasks. I’ll use an analogy to web design, where the principle is tried and true, and then give concrete examples specific to academic tasks to encourage others down the path less traveled. Today, using markdown, pandoc, and other brilliant tools, it’s possible to really separate content from style across the board for all types of media and communication. By doing this, you get to reap the same benefits that have become an iconic part of good web development.
Why divide content and style
Let’s start with quick definitions of content and style. There’s nothing magical here, all I mean by content is the actual information (usually text) that describes your work. Be it on a web page, on your CV, in a publication, or in a presentation, there is some thing you’re trying to communicate – that’s content. Style, then, is simply the (visual) way you present that content. The colors, whitespace, fonts, sizes, layouts – how the content is formatted and organized.
Dividing the two means the organization is stored separately (usually in a different file) from the content itself. This is one of the primary purposes of CSS: you write the actual text to display in content.html, then you define the colors and fonts in style.css. Instead of saying such-and-such text is green in the HTML directly, you simply tag that text with an attribute (“header” or “highlight” or “normal”), and then you specify that all “header” text should be green. Why is this useful? For three reasons:
Theming. This makes it really easy to change styles within your web site systematically. For example, you can update all headings in one place, without mucking with your content.html file at all. A quick adjustment and you’ve re-themed your entire document (web page, CV, presentation, reference list, or publication).
Modularity of style. You can now very easily apply a single style across multiple content pages. If you have many content pages, this eliminates a lot of work, since you only have to define the style once, and just link it to each content page.
Portability of content. Your content is now much more universal; it’s not defined rigidly, making it reusable. You can switch media formats more easily. Copy your text from your web source right into a Word document without problem (don’t take this for granted: the inverse is not always possible).
Content and style in papers and presentations
Outside of the web (at least in biology), style and content are frequently intertwined. I’ll explain using 4 examples of common communication media in science: reference lists, papers, presentations, and curriculum vitae.
Reference lists. Reference lists are the one area where scientists have realized the value of separating content and style. Each journal has its own citation style, and manually formatting references to even one style would be insane, so it’s common to use a database of references that can then export a works cited page in various styles, which are stored separately and can thus be applied one-to-many across documents. There are plenty of existing systems that already do this quite well (and are actually a disaster for other reasons, but that’s a topic for another day). The modularity of style advantage above is roughly solved for reference lists – but one remaining challenge is that often the in-text citation styles are non-portable. For example, if you use Endnote to insert citations into your Word document, it will nicely manage both in-text citations and your works cited page, but if you copy a portion of this text into a different medium (web page, text file, presentation), you lose the Endnote connection because it is Word-specific. In other words, the link between content and style is not universal, and so the division between content and style is incomplete. The markdown/pandoc method solves this issue (details below).
Academic papers. Word processors (like Microsoft Word) have come a long way in terms of managing styles, but Word documents still encode content and style in the same document. Furthermore, for most people, the use-case of word processors intertwines style and content within the file and document. While Word themes make it possible to separate them for a single document, probably 99% of users don’t use this and are integrating content and style in Word documents. Every time you select some text and make it bold or italic or change its size, you are embedding a stylistic command within content. You lose the 3 advantages outlined above: 1) Within-document theming exists but is weak and is usually ignored, 2) you cannot link multiple content documents to a single style, and 3) content is not portable (copy/pasting out of your Word document into a web form will lose all your beautiful formatting). Try building a web page with content from a Word doc – this leads to a disaster of non-ASCII characters.
Presentations. Slide software (Powerpoint, Keynote) has essentially the same problem. Again, the themes concept attempts to address this, and is actually more widely used, but styles are still ultimately stored together with content, and the user interface encourages direct editing of style attached to content, reducing the utility of themes. Content is even less portable than from a word processor – there’s no straightforward way to build a web page or a paper from your slide deck content; images are embedded within the presentation making re-using them problematic; and so on. For example, say you want to take a chunk from one presentation and use it in another, or share it with a collaborator. Your slides are unlikely to match the ‘look-and-feel’ of the other presentation.
Curriculum Vitae. Most CVs that I see appear to be managed in Word, inheriting all the problems of a general academic paper. An enlightened few have discovered the power of LaTeX for CVs which separates style from content, but this is the exception in most fields.
How to separate content and style with markdown and pandoc
If you’re not already familiar, it’s well worth your time to go learn more about markdown. In a sentence, markdown is a very simple markup language using plain text formatting syntax. Combined with a few other tools, markdown makes it possible for you to almost completely separate content from style not just for web pages, but for all of these common communication media.
My workflow is like this: I author all my content in markdown. That means presentations, papers, CV, reference lists, web pages, anything. Since my content is all in the same format (markdown), I can copy/paste that content from one intended medium to another, making it really easy to publish content in multiple ways, which I always seem to need to do. I then pair my content with a series of other tools to generate the style I need, be it a web page, a presentation, a PDF, or what have you. So the general philosophy is that as much as possible, all content is markdown. Then I just style things as needed. Here’s how it works for different output media:
Web pages. Jekyll – Jekyll is a simple templating engine that lets you build web pages from markdown content.
Reference lists. It doesn’t make sense to store references in markdown, so in this case, my content is really in Bibtex format, which is a file format used to describe references (papers). I use JabRef to manage my Bibtex database. JabRef can do scripted exports using custom styles to export lists of references in any format, including markdown. This then plugs right in to all the other tools, so I can produce reference lists in any of the other media, straight from my database. I use this to include nice linked reference lists at the end of my presentations, or on my web page. I can use the same database to add a reference list to my CV, to a grant, or to a paper (see below). Thus I must only curate a single database of references (my content), and I export them as needed in whatever style.
Academic papers. I write papers in markdown, and then I convert to PDF, docx, or whatever format using pandoc. Pandoc is a universal content format converter. Pandoc is particularly good at taking markdown content and converting it into other formats (including HTML, PDF, and Word .docx). To manage citations, I use the pandoc-citeproc filter, which is a citation processor that lets you process Bibtex citations in markdown documents using pandoc. This filter plugs into the Citation Style Language (and its styles repository), an open language to describe citation styles.
Presentations. Reveal.js is a presentation framework that lets you author presentations in HTML or markdown. Any text I can just use markdown and make decent looking presentations. I admit that in a good presentation, markdown isn’t the solution, because presentations should be dominated by visual display of information rather than text, so in this case I mostly use SVG format – but for text when necessary, Reveal makes it nice to use markdown.
Curriculum Vitae. Still haven’t gotten there yet, but I intend to write a markdown CV and convert to PDF or HTML as needed using pandoc.
Why markdown is the key
To me, the step forward that makes most of this possible is markdown because it enables you to embed simple text markup (assigning headings, citations, etc.) in a human-readable ASCII standard format, so the content is completely portable. It even looks nice in a markdown-aware text editor, which understands asterisks for bold, etc. I guess this has been possible for a while with LaTeX, but the complexity of LaTeX makes it great for layout but less great for just writing. The heaviness of LaTeX makes it tough for simple tasks, plus it’s geared toward page-like outputs (which are getting less important). LaTeX has its place, but it’s not the answer to the universal content type that will generate all styles. Markdown provides a happy compromise – it gives you all the key stuff that you always need, but none of the stuff you don’t. Markdown is perhaps as close as we’ll get to a compromise between WYSIWYG and separating content and style, which to me makes it a beautiful balance.