Citation management

On PDFcocking

2014-09-08 — 2020-07-14

academe

collective knowledge

computers are awful

faster pussycat

how do science

workflow

Suspiciously similar content

The genealogy of evidence is important, and there are many important ideas about how we could track it, especially with advances in technology; however, this page is not about that propagation of certainty, but rather the shabby proxy, citations in actually-existing academic publishing.

In particular, I answer for myself: How can I get my journal-ready citations in the 19th-century-style format required by my journals with the greatest possible degree of modern convenience? Fast-forwarding citation conventions themselves all the way to the state-of-the-1940s-art or beyond, that must fall to someone else with time.

There are many moving pieces in the modern citation workflow - the importing of references to a database, the management of references within that database, the rendering of a bibliography in a document, etc.

After trying too many alternatives at great cost of time, I have settled upon Zotero to manage most of those steps. I also use BibLaTeX to render bibliographies in LaTeX articles and pandoc to render bibliographies on this blog and other webby things.

Zotero, BibLaTeX, and pandoc are all open source, powerful, and hackable.

The most complicated bit is Zotero, which is my main interface and working tool. It does my article importing, management, note-taking, syncing, etc. Pretty much everything apart from rendering the final document. More on this below. It could be more user-friendly; but then the competitors set the bar so low that this is hardly a criticism. Slightly more user-friendly but substantially less hackable is Mendeley, a closed-source reference manager that I would not judge you for using. Since I have no patience for things that cannot be automated, Zotero is an easy winner for me; you may wish to try both in case you have different tolerances than I.

Both are excellent at importing citations from your web browser as you work, in maintaining them in a database, and in exporting them in whatever format you choose.

All other options that I have tried are abysmal and I can say nothing but I told you so if you try them and they give you grief.

“What kind of grief?” you might ask. But please don’t. Many days of lost work migrating from discontinued software in ways that are too tedious to recall.

1 Bibliographic database

The main part. I use Zotero, which makes this mostly simple. I have tried other options and they are tiresome. With Zotero, I visit an article in my browser, and a button appears in the browser to enable me to import the article into my literature database. I click it and it magically appears in my database, with all the metadata and citation information and a copy of the PDF. End of story. If you prefer a more complicated relationship to citations than that, that is on you.

NOW! we have a healthy database of references! How do we get them into documents?

2 Pandoc-citeproc

Pandoc also supports citations. This mostly targets markdown rendering to other formats, but it will also work for latex as a substitute for BibTeX/BibLaTeX.

See the following write-ups.

Chris Krycho’s pandoc-based workflow.
Caleb McDaniel talks through the process of getting citations into near anything using pandoc;
Or: John MacFarlane on getting citations in things using pandoc

The preferred pandoc-citeproc format is something with an @ sign and/or occasional square brackets:

Blah blah [see @heyns_foo_2014, pp. 33-35; also @heyns_bar_2015, ch. 1].
But @heyns_baz_2016 says different things again.

This is how you output it.

# Using the CSL transform
pandoc -F pandoc-citeproc --csl="APA" --bibliography=bibliography.bib \
  -o document.pdf document.md
# or using biblatex and the traditionalist workflow.
pandoc --biblatex --bibliography=bibliography.bib \
  -o document.tex document.md

If you are using RMarkdown this will be done automatically for you. Nifty.

To integrate with Zotero, you need to set up Preferences>export>Default format to be Better Bibtex citation key quick copy.

I automate this with a custom BibTeX export script.

See the pandoc manual and the pandoc-citeproc manual. See also the markdown/pandoc page for more on this.

Gotcha: In HTML output formats, it will render nice citations, except with butt-ugly naked URLs instead of hyperlinks, because hyperlinks are not possible, and not even in scope yet. Please vote for those GitHub issues.

3 BibTeX/BibLaTeX

My practically essential tool and how I actually do this when I am writing science articles (as opposed to blog posts). See BibLaTeX, and zotero.

4 To mention

Jabref, Bibdesk.

5 To avoid

There have been various other options such as Papers (meh) and Sente (defunct due to being crappy) and (sigh) Endnote. I won’t link or refer to those further, for the reason that I’ve already lost too much data that way, and I don’t intend to lose more. Since all citation software is, basically, awful, it is crucial that whichever application you choose, it is one that you can get your data out of when you find a less awful option or when it implodes from awfulness. The one that is best at letting you keep all your citations even if you ditch it, is Zotero. Also, it’s probably the least awful.

Still, if you’re unswervingly dedicated to trying other things for yourself, my advice for any closed-source tool in this domain would be the same: Try and see how well you import and export data, en masse, because that’s what you’ll have to do if the company goes bankrupt or gets bought by Google and shut down, or by Yahoo and accidentally set on fire, or by Facebook and you are only allowed to use it if you click on ads promoting sports shoes for 18 minutes out of every hour or whatever unpaid market research work they allocate to you.

All the alternatives apart from Mendeley and Zotero have failed the test of preserving my precious data when I migrated to a different software package, so using those other packages is putting my work in the uncaring hands of an unaccountable third party. To actually extract my data from Sente, for example, I had to burn a whole working week turning their malformed markup into valid XML, (which is a specialty that I don’t care about and no one should ever need to care about) and I still couldn’t work out how to parse some of it. Then many other things went wrong. Also, Mendeley has started behaving suspiciously since they were bought by Elsevier.

If you mostly care about LaTeX output, the usual lowest common denominator amongst academic collaborators, one might be able to survive on Bibdesk or jabref or just editing a plain Bib(La)TeX file, but I for one could not bear to give up the browser integration of Zotero, which has saved innumerable hours of painstaking pointless typing, and can output BibTeX just fine for the use of Bibdesk lovers.

🏗 Complain about the entire structure of citations in the electronic age (keep it short though, because everyone is tired of complaining about it, and at least it’s better than the general howling void of [unsourced internet mediaepistemic community.)

🏗 Apologise for accidentally complaining at length despite my stated aim of keeping it short. :smirk:

6 docutils citations

a.k.a. Citations in ReST.

I no longer recommend this. For all the laudable design goals and extensibility of ReST, it’s not where the community is. They are all using markdown.

But if you are keen, the docs say:

Standard ReST citations are supported, with the additional feature that they are “global”, i.e. all citations can be referenced from all files.

I can add:

For your comfort and convenience these citations will be rendered as born-obsolete fugly 1995-esque hard-coded HTML tables that no one in the entire internet has managed to whip into anything other than an eyesore in a decade of vain struggle.

More resources:

citations in Sphinx
sphinxcontrib-bibtex provides citation management for sphinx.