Abstract

I use pandoc for converting markdown into PDF. Using LaTeX templates, I can make nice-looking PDFs output that, in my opinion, look as good as a professional publication. But one challenge I've struggled with is how to handle supplementary citations. Here, I outline the problem and describe a lua filter I wrote to solve it.

Introduction

I’ve been using markdown + pandoc to author my scientific articles for several years now, and it’s fantastic. I usually have a “manuscript.md” document, and a “supplement.md” document, which I paste together using something like pandoc manuscript.md supplement.md .... Right now, my system works great as long as there are no citations in the supplement. I build the main and supplement together, which allows the main document to reference supplementary figures (and vice versa). Then I split the PDF to divide the main document off from the supplement for submission, and the supplement can stand on its own on the journal’s page.

However, this breaks down if the supplement has citations. By default, pandoc appends the supplemental references to the main bibliography placed at the end of the manuscript (where I place the reference list using <div id="refs">). There is no supplementary references section. This is not what I want. I need the references to be split into primary/supplemental.

Here are some ways I thought about solving the problem, and what I eventually ended up doing:

Solution idea 1: build them separately

Could I just build them separately? Well, if I built the supplementary file alone, the references would be fine and I could add a supplemental refs section. If I built the manuscript alone, it would only contain the primary citations, which would be correct – But then I lose the ability to cross-reference figures, so I couldn’t reference the supplemental figures from the main manuscript, or vice versa.

Solution idea 2: Build separately and include \labels.

To solve the problem above, I could add \label commands for supplemental figures and tables at the end of the primary manuscript. Then, I could refer to supplemental figures in the primary text (and potentially vice versa), and references are kept separate. Build manuscript and supplement separately, and it works. I tried this, and it works… The annoying thing about this is that I have to manually add those \label lines. To make it easier, you’d want to auto-add the supplemental labels to the end of the main manuscript file on the fly, drawing from the supplement.md file, so that you don’t have to keep that information up-to-date in 2 places.

Weaknesses of the build them separately approach: - need a way to auto-curate that list of supplemental labels. Manual is labor-intensive and error-prone. - if you do want them to be built together (e.g. for biorxiv submission), then the internal links don’t work; you’d need to concatenate the 2 PDFs. - issues dealing with templates that differ between the two. In my use case, the supplement requires some extra metadata (onecol: true), which made it problematic.

I wrote a lua filter that takes supplemental images and only adds labels, which works. This adds the latex labels from the supplement. But this requires it to be aware of both files, so it’s not 100% independent. However, what about tables? I can’t get it to work for tables if I’m using latex tables.

Ideas: - set the metadata into a separate file so it can be inserted into both files? independent. Perhaps there could be a “supplement” tag, which is read by the lua filter, that then hides everything after it. Then you’d still feed ’em both in, but it just converts the supplement into labels.

Solution idea 3: Multi-refs

I wrote an alternative filter that allows you to split the references, called “multi-refs.lua”. This now enables the complete manuscript+supplement with supplemental citations. This ended up working beautifully. Analogous to the vanilla pandoc <div id="refs"> approach, the filter uses <div class="multi-refs">, and you can now can include multiple of them. The references included will be anything cited up to that point in the document. Once this document is built, the manuscript only and supplement only can then be split off of this.

Together with a few other settings, like whether to include duplicate references, which can be changed in the metadata, this has so far solved the issue for me. You can find my lua filter in the sciquill repository.