Academic reading workflow

The continuing ascendancy of using piles of dead tree products for understanding cutting edge digital informatics

PDF is a terrible format, but it is the standard in academia, despite some perfunctory efforts to make like the rest of the world and get with ebook formats. That would be nice; but in the academic world, few academic communiques are kindle-compatible, and generally if I convert PDFs to ebooks the equations and graphs etc turn into 💩, so this is only a solution for people who survive without equations or tables or graphs, which does not resemble my job description.

Now, how will I read all those PDFs and annotate them without losing track and going crazy? Paper is the best reading experience, but it’s just too damn heavy. Bonus points if I can sync my annotations to my citation management software. More bonus points if I can also synchronise to a convenient e-reader so I don’t have to have my distracting laptop to read every sodding thing. Bonus points if the solution involves not putting all my notes in some obscure opaque commercial database with no guarantee of existing next week.


  • Zotero can sync annotations and store PDFs with citation metadata conveniently. See citation management for the details. It knows how to capture and store journal article metadata really well. (open source)

  • Calibre isn’t a general metadata sync solution, but it does manage ebooks well, especially ones that are real books and have ISBNs etc. And it does synchronise with various ebook readers and convert to their local dialect of whatever. (open source)

If I only read books or I only read papers and I had time, I could possibly hack one of these into being a general purpose document annotation-and-metadata-and-ebook-reader-and-desktop-synchronisation system. As it is, I swap awkwardly between two systems depending on, basically, whether the PDF I am reading is short (Zotero) or long (Calibre).


e.g. iPad, kindle fire, Onyx boox. I like e-readers because they don’t have other functions I can get distracted with.

However, e-reading seems to be a punishing workflow for many tablets, and they end up being pricey.

Good reading apps

  • koreader is open-source and fancy, for Andoird
  • marginnote seems well-considered and sexy, including flashcard integration but the bloody thing only runs on Macs and iOS, so us e-Ink people are out in the cold.

How can I get my incoming articles on my tablet?

I use Zotero plus syncthing for my journal papers, and Calibre for my textbooks.

This works, if not seamlessly, then at least smoothly.

How can I get my notes off it and into something useful?

For the Onyx (below) this the notes are sort-of easy to extract – they are PDF annotation, whch I extract using Zotfile. For kindle, 🤷. Calibre extraction no longer works. Kindle has a hack called clipbook to make them look nice if I manually export them, but exporting is the msot tedious part. Possibly Klib (macOS) makes that smoother, or Kindle Mate (Windows).

Onyx Boox

My particular e-reader. Max2 isn’t well documented, but there is a bulletin board. Missing the attached pen? Apparently Amazon code B078WLP9L8 is Onyx max2 compatible.

Firmware updates be downloaded from the manufacturer.

Apart from that it just works. Could be more responsive, but fast enough. Using syncthing, I get my annotations off the tablet OK because they are saved in text files. Using calibre I can manage the audiobooks smoothly. Adequate. Easy-ish.

Paper analysis/annotation

Once I have found relevant research, how to best read through it?

Baldur Bjarnason, Neither Paper Nor Digital Does Active Reading Well:

Catching up on usability research throughout the years makes you want to smash your laptop agains the wall in anger. And trying to fill out forms online makes you scream ‘it doesn’t have to be this way!’ at the top of your lungs.

Software developer inattention to research makes sense when you think of it as a pop culture that—in the 33% of the time where its projects don’t flame out—occasionally has a productive side effect.

The same applies to reading software. When you read up on research and papers on skills development, memory formation, and active reading, frustration with existing tools inevitably follows.

At least with paper, we can teach people to hack their tools—extend the printed book with post-its, commonplace books, bookmarks, and inline annotation. Doing the same in digital is incredibly hard without programming skills (see the low success rate above) or expensive tools, even when the closed silos allow it.

The cognitive effort to actively and intelligently read a text in depth is, if not equal to, then on the same order of magnitude as the effort to write about a complex subject.

But we only have full-featured tools to help us with writing. Ulysses, Tinderbox, Scrivener, etc. all make managing and writing a complex writing task much easier. Even code-oriented text editing workflows with their steeper learning curves are a major improvement over paper-based writing workflows.

We can also use paper-based writing tactics in tandem with the digital ones, to the point of going back and forth between the two. You can’t do the same easily with reading.

Which leads us to the current situation: our ability to handle complex writing tasks is increasing while our default reading toolset is stagnating at best.

He ends up giving an extensive advertisement for liquidtext, an ipad app with actual UI development.

Select text to annotate. Add tags and post publicly or save privately.

Reply to or share any annotation. Link to notes or whole pages.

Annotate together in groups. Collaborate privately with others.

Search your notes. Explore all public annotations and profiles.

The have documented a recommended workflow.


is software that organizes information on webpages that you’ve visited. It records pages you go to, extracts data from it and enrich the data that was extracted. It augments the pages in your browser by allowing you to tag objects as well as decorating objects it deems important. It then arranges the data in an UI. Vortimo support switching between cases/projects seamlessly. You can also generate PDF reports based on the aggregated information and meta information.

Vortimo can be used by anyone that uses a browser to research a topic. This includes investigators that are profiling individuals or companies, intelligence analysts using open source intelligence (OSINT), IT security personnel or even academics doing domain specific research. Vortimo collects and organises information that came from your browser – so it does not matter if you’re browsing the web, using social networks or visiting your company’s intranet.

pdfx (source) claims to:

  • Extract references and metadata from a given PDF
  • Detect pdf, url, arxiv and doi references
  • Fast, parallel download of all referenced PDFs
  • Output as text or JSON (using the -j flag)
  • Extract the PDF text (using the --text flag)
  • Use as command-line tool or Python package
  • Works with local and online pdfs

Fermat’s Librarian is

A Chrome extension that enhances arXiv papers. Get direct links to references, BibTeX extraction and comments on all arXiv papers.