Academic reading workflow

The continuing ascendancy of using piles of dead tree products for understanding cutting edge digital informatics

April 11, 2016 — July 8, 2024

computers are awful
faster pussycat
information provenance
Figure 1: Plate 82 of Recueil d’ouvrages curieux de mathematique et de mecanique; ou, Description du cabinet de Monsieur Grollier de Serviere.

Reading articles and textbooks, scientific ones: By this I usually mean: reading paper books or reading PDFs. PDF is a terrible format, but it is the standard in academia, despite some perfunctory efforts get with e-book formats, or web pages or anything else, really. In the actually-existing scholasticism, few communiques are kindle-compatible, and generally if I convert PDFs to e-books the equations and graphs etc turn into 💩, thus many e-readers are only a solution for people who survive without equations or tables or graphs.

Now, how do I read all those PDFs and annotate them without losing track? Printing them on paper is the best reading experience, but it’s too heavy to carry about my current reading list of articles. And textbooks are too expensive in any case.

Digital options, then! Bonus points if I can sync my annotations to my citation management software. More bonus points if I can also synchronise to a convenient e-reader so I don’t need to have my distracting laptop to hand in order to read every last thing. Bonus points if the solution involves not putting all my notes in some obscure opaque commercial database with no guarantee of existing next week.

1 Desktop

  • Zotero can sync annotations and store PDFs with citation metadata conveniently. It knows how to capture and store journal article metadata well. (open source)

  • Calibre isn’t a general metadata sync solution, but it does manage e-books well, especially ones that are real books and have ISBNs etc. And it does synchronise with various e-book readers and convert to their local dialect of whatever. (open source, although it is a giant bag of chaos and I defy anyone else to participate other than the creator.)

  • Ubooquity has been recommended to me also, have not tried it.

If I only read books or I only read papers and I had time, I could possibly hack one of these into being a general purpose document annotation-and-metadata-and-e-book-reader-and-desktop-synchronisation system. As it is, I swap awkwardly between two systems depending on, basically, whether the PDF I am reading is short (Zotero) or long (Calibre).

UPDATE: Now I am 100% Zotero all the way, because Calibre stopped syncing for me under Linux.

Collaborative reading tool zocurelia combines zotero and Is that useful?

I’m not quite sure which app to run for macos DJVU reading. Some candidates

2 Tablets and E-readers

See EReaders.

3 Paper analysis/annotation

Figure 2

Once I have found relevant research, how do I best annotate and/or summarise it for my future use? Further, how might we do so collectively?

3.1 Active reading

Baldur Bjarnason, Neither Paper Nor Digital Does Active Reading Well:

Catching up on usability research throughout the years makes you want to smash your laptop against the wall in anger. And trying to fill out forms online makes you scream ‘it doesn’t have to be this way!’ at the top of your lungs. […] The same applies to reading software. When you read up on research and papers on skills development, memory formation, and active reading, frustration with existing tools inevitably follows.

At least with paper, we can teach people to hack their tools—extend the printed book with post-its, commonplace books, bookmarks, and inline annotation. Doing the same in digital is incredibly hard without programming skills […] or expensive tools, even when the closed silos allow it. […]

The cognitive effort to actively and intelligently read a text in depth is, if not equal to, then on the same order of magnitude as the effort to write about a complex subject.

But we only have full-featured tools to help us with writing. Ulysses, Tinderbox, Scrivener etc all make managing and writing a complex writing task much easier. Even code-oriented text editing workflows […] with their steeper learning curves are a major improvement over paper-based writing workflows.

We can also use paper-based writing tactics in tandem with the digital ones, to the point of going back and forth between the two. You can’t do the same easily with reading.

Which leads us to the current situation: our ability to handle complex writing tasks is increasing while our default reading toolset is stagnating at best.

He ends up giving an extensive advertisement for liquidtext, an ipad app with lauded UI development, which looks nice. If you have an ipad.

4 AI assisted

But these days we use AI assist.

scite provides citation context about a given article, and in particular, a given claim.

scite is an award-winning platform for discovering and evaluating scientific articles via Smart Citations. Smart Citations allow users to see how a publication has been cited by providing the context of the citation and a classification describing whether it provides supporting or contrasting evidence for the cited claim.

Humata - GaussianT for your files — answers questions about papers.

4.1 Collective annotation

A neat one is pubpeer’s overlay for web browsers which opens a window into the sometimes-acrimonious academic critique world. (Their peeriodicals system might take this to a new level within teams.)

Select text to annotate. Add tags and post publicly or save privately.

Reply to or share any annotation. Link to notes or whole pages.

Annotate together in groups. Collaborate privately with others.

Search your notes. Explore all public annotations and profiles.

The have documented a recommended workflow.


is software that organizes information on webpages that you’ve visited. It records pages you go to, extracts data from it and enrich the data that was extracted. It augments the pages in your browser by allowing you to tag objects as well as decorating objects it deems important. It then arranges the data in an UI. Vortimo support switching between cases/projects seamlessly. You can also generate PDF reports based on the aggregated information and meta information.

Vortimo can be used by anyone that uses a browser to research a topic. This includes investigators that are profiling individuals or companies, >intelligence analysts using open source intelligence (OSINT), IT security personnel or even academics doing domain specific research. Vortimo collects and organises information that came from your browser — so it does not matter if you’re browsing the web, using social networks or visiting your company’s intranet.

pdfx (source) claims to:

  • Extract references and metadata from a given PDF
  • Detect pdf, url, arxiv and doi references
  • Fast, parallel download of all referenced PDFs
  • Output as text or JSON (using the -j flag)
  • Extract the PDF text (using the --text flag)
  • Use as command-line tool or Python package
  • Works with local and online pdfs

Fermat’s Librarian is

A Chrome extension that enhances arXiv papers. Get direct links to references, BibTeX extraction and comments on all arXiv papers.