Academic reading workflow

The continuing ascendancy of using piles of dead tree products for understanding cutting edge digital informatics

On reading articles and textbooks. In practice this means reading paper books or reading PDFs. PDF is a terrible format, but it is the standard in academia, despite some perfunctory efforts to make like the rest of the world and get with e-book formats. That would be nice; but in the academic world, few academic communiques are kindle-compatible, and generally if I convert PDFs to e-books the equations and graphs etc turn into 💩, so this is only a solution for people who survive without equations or tables or graphs, which does not resemble my job description.

Now, how will I read all those PDFs and annotate them without losing track? Paper is the best reading experience, but it’s just too heavy (and textbooks are too expensive.) Bonus points if I can sync my annotations to my citation management software. More bonus points if I can also synchronise to a convenient e-reader so I don’t have to have my distracting laptop to read every last thing. Bonus points if the solution involves not putting all my notes in some obscure opaque commercial database with no guarantee of existing next week.


  • Zotero can sync annotations and store PDFs with citation metadata conveniently. See citation management for the details. It knows how to capture and store journal article metadata really well. (open source)

  • Calibre isn’t a general metadata sync solution, but it does manage e-books well, especially ones that are real books and have ISBNs etc. And it does synchronise with various e-book readers and convert to their local dialect of whatever. (open source)

If I only read books or I only read papers and I had time, I could possibly hack one of these into being a general purpose document annotation-and-metadata-and-e-book-reader-and-desktop-synchronisation system. As it is, I swap awkwardly between two systems depending on, basically, whether the PDF I am reading is short (Zotero) or long (Calibre).

Collaborative reading tool zocurelia combines zotero and Is that useful?


e.g. iPad, kindle fire, Onyx boox. I like e-readers because they don’t have other functions I can get distracted with.

However, e-reading seems to be a punishing workflow for many tablets, and they end up being pricey.

Good reading apps

  • koreader is open-source and fancy, for Android
  • marginnote seems well-considered and sexy, including flashcard integration but the bloody thing only runs on Macs and iOS, so us e-Ink people are out in the cold.

How can I get my incoming articles on my tablet?

I use Zotero plus syncthing for my journal papers, and Calibre for my textbooks.

This works, if not seamlessly, then at least smoothly.

How can I get my notes off it and into something useful?

For the Onyx (below) this means the PDF annotations are sort-of easy to extract — For kindle, 🤷. Calibre extraction no longer works. Kindle has a hack called clipbook to make them look nice if I manually export them, but exporting is the msot tedious part. Possibly Klib (macOS) makes that smoother, or Kindle Mate (Windows).

Onyx Boox

My particular e-reader. Max2 isn’t well documented, but there is a bulletin board.

There is a new edition now, the Onyx Boox 3, which I have not tried.

It just works. Could be more responsive, but fast enough. Using syncthing, I get my annotations off the tablet OK because they are saved in text files. Using calibre I can manage the audiobooks smoothly.

Firmware updates be downloaded from the manufacturer. However, installing non-manufacturer approved apps is tedious and the manufacturer app store is not large. I managed to install google play on it briefly but that broke in a recent update. Nonetheless since I only use the thing as an e-reader this does not matter greatly except that I want use syncthing to since stuff. I work around that by installing the app as an .apk file.

Missing the attached pen? Apparently Amazon code B078WLP9L8 is Onyx max2 compatible.

Paper analysis/annotation

Once I have found relevant research, how to best annotate it for future use? How do I do so collectively?

The most timely one here is pubpeer’s overlay for web browsers which gives you a window into the sometimes-acrimonious academic critique world. Their peeriodicals system might take this to a new level within teams.

Baldur Bjarnason, Neither Paper Nor Digital Does Active Reading Well:

Catching up on usability research throughout the years makes you want to smash your laptop against the wall in anger. And trying to fill out forms online makes you scream ‘it doesn’t have to be this way!’ at the top of your lungs. […] The same applies to reading software. When you read up on research and papers on skills development, memory formation, and active reading, frustration with existing tools inevitably follows.

At least with paper, we can teach people to hack their tools—extend the printed book with post-its, commonplace books, bookmarks, and inline annotation. Doing the same in digital is incredibly hard without programming skills […] or expensive tools, even when the closed silos allow it. […]

The cognitive effort to actively and intelligently read a text in depth is, if not equal to, then on the same order of magnitude as the effort to write about a complex subject.

But we only have full-featured tools to help us with writing. Ulysses, Tinderbox, Scrivener etc all make managing and writing a complex writing task much easier. Even code-oriented text editing workflows […] with their steeper learning curves are a major improvement over paper-based writing workflows.

We can also use paper-based writing tactics in tandem with the digital ones, to the point of going back and forth between the two. You can’t do the same easily with reading.

Which leads us to the current situation: our ability to handle complex writing tasks is increasing while our default reading toolset is stagnating at best.

He ends up giving an extensive advertisement for liquidtext, an ipad app with lauded UI development, which looks nice. If you have an ipad.

Select text to annotate. Add tags and post publicly or save privately.

Reply to or share any annotation. Link to notes or whole pages.

Annotate together in groups. Collaborate privately with others.

Search your notes. Explore all public annotations and profiles.

The have documented a recommended workflow.


is software that organizes information on webpages that you’ve visited. It records pages you go to, extracts data from it and enrich the data that was extracted. It augments the pages in your browser by allowing you to tag objects as well as decorating objects it deems important. It then arranges the data in an UI. Vortimo support switching between cases/projects seamlessly. You can also generate PDF reports based on the aggregated information and meta information.

Vortimo can be used by anyone that uses a browser to research a topic. This includes investigators that are profiling individuals or companies, intelligence analysts using open source intelligence (OSINT), IT security personnel or even academics doing domain specific research. Vortimo collects and organises information that came from your browser — so it does not matter if you’re browsing the web, using social networks or visiting your company’s intranet.

pdfx (source) claims to:

  • Extract references and metadata from a given PDF
  • Detect pdf, url, arxiv and doi references
  • Fast, parallel download of all referenced PDFs
  • Output as text or JSON (using the -j flag)
  • Extract the PDF text (using the --text flag)
  • Use as command-line tool or Python package
  • Works with local and online pdfs

Fermat’s Librarian is

A Chrome extension that enhances arXiv papers. Get direct links to references, BibTeX extraction and comments on all arXiv papers.