Disseminating science

Journals and preprint servers etc

2015-07-07 — 2026-02-11

Wherein the economics of journals are surveyed, impact indices are enumerated, and the modern clock of ML conferences—once kept by aideadlin.es—is recorded as having fallen silent.

academe
collective knowledge
diy
doing internet
economics
faster pussycat
how do science
incentive mechanisms
institutions
mind
networks
provenance
sociology
Figure 1

Some notes on publishing science. Publication connects: reproducibility, scholarly discovery, intellectual property, peer review, academic business models, and so on.

1 Economics of publishing

To some extent, the production of academic knowledge is a public-goods problem. Because academic publishing is part of that production, it inherits some of those oddities.

Cameron Neylon has built a cottage industry of producing pragmatic critiques of publishing from an institutional economics perspective:

e.g., The Marginal Costs of Article Publishing or A Journal is a Club:

We’d been talking about communities, cultures, economics, “public-making,” but it was the word ‘club’ and its associated concepts, both pejorative and positive, that crystallized everything. We were talking about the clubbishness of making knowledge — the term “Knowledge Clubs” emerged quickly — but also the benefits that such a club might gain in choosing to invest in wider sharing.

Working paper: Potts et al. (2016). Or see Afonso (2014): “How academia resembles a drug gang”.

How to Get Something Out of Neoliberal Critique Without (Immediately) Overthrowing the Capitalist System:

In the business setting, this often leads incumbent publishers to a kind of spluttering defense of the value they create while simultaneously complaining that the customer doesn’t appreciate their work. Flip the target slightly, and we’d call this “missing the new market opportunity” or “failing to express the value offering clearly.” […]

Lingua, […] has gone from one of the most important journals in analytical linguistics to no longer being in the field and seems well on its way to becoming irrelevant. How does a company as competent in its business strategy as Elsevier let this happen? I would argue, as I did at the time that the former editorial board of Lingua resigned to form Glossa, that it was a failure to understand the assets.

The neoliberal analysis of Lingua showed an asset generating good revenues, with good analytics and a positive ROI. The capitalist analysis focused on the fixed assets and trademarks. But it turns out these weren’t what was creating value. What was creating value was the community, built around an editorial board and the goodwill associated with that.

Also, see Pushing costs downstream.

Here’s something I want to phrase a bit better, but I think it’s important: An Adversarial Review of “Adversarial Generation of Natural Language”: The argument is that, even though it’s nice that arXiv avoids some problems of traditional publishing, it still inherits others that traditional publishing tries to avoid. No free lunches.

2 Publishing as performance indicator

The classic academic problem. Journal rank and journal impact factors, etc. We all know this system is broken, and we complain about it a lot: it wastes money, heavily subsidizes private publishers, and does little to ensure quality compared with the price we pay.

But until — or unless — we get so famous we have nothing to prove, those of us aspiring to academic careers need to play the game. Funders care, against our advice, but whatever — they have the money, so we need to care if we want them to keep funding us.

Latrobe explains it. Scimago Journal Rank is the Google PageRank-inspired, slightly hipper journal ranking. Their search tool is probably what we want. Impact factors come from the ’60s and are still around, h-index is also a thing. journalrank might be a factor too?

According to Latrobe, here are some indices and a partial list of their weaknesses.

2.1 h-index

Hirsch index: The number of articles in a journal [h] that have received at least [h] citations over a citation period.

Weaknesses:

  • Editors can manipulate by requiring contributors to add citations from their journals
  • Increases with age so bias towards researchers with long publication records

2.2 JIF

Journal Impact Factor: Citations to a journal in the JCR year to items published in the previous two years, divided by the total number of citable items (articles and reviews) published in the journal in the previous two years.

Weaknesses:

  • Limited to journals within Web of Science
  • Cannot be used to compare journals across different subject categories

2.3 SJR

SCImago Journal Rank: Average number of weighted citations received in a year, by articles published in a journal in the previous 3 years.

The weaknesses are that it’s “complicated” and the numbers are small.

So I guess if we must do a journal ranking, this is the least bad method?

2.4 CORE/ICORE conference ratings

The ICORE rankings rank conferences by some notion of importance within their fields. For example, in my workplace we’re expected to travel only to “A*” conferences, which optimizes the organization’s prestige and helps defend our funding. Are they the best conferences to attend in terms of outcomes? I don’t know.

3 Other methods

Conference Ranks aggregates many measures at once.

4 Shadow libraries

Shadow libraries are “online databases of readily available content that is normally obscured or otherwise not readily accessible. Such content may be inaccessible for a number of reasons, including the use of paywalls, copyright controls, or other barriers to accessibility placed upon the content by its original owners.”

The biggest phenomenon in open access, as far as I can tell, is the massive pirate infrastructure that makes journals freely available. See also Copyright activism.

Anecdotally, for example, no work in Indonesian academia would be possible without access to shadow libraries. Shadow libraries seem to be legal in some jurisdictions, but not in others. We should check our local laws before accessing them. For some speculation and developments in the legality of shadow libraries, see:

  • How illegal is Sci-Hub? : r/scihub

  • Sci-Hub legal status

  • Jonathan Basile’s essay on AAARG, Who’s Afraid of AAARG?.

  • Karaganis (2018): [TODO clarify]

    From the top down, Shadow Libraries explores the institutions that shape the provision of educational materials, from the formal sector of universities and publishers to the broadly informal ones organized by faculty, copy shops, student unions, and students themselves. It looks at the history of policy battles over access to education in the post—World War II era and at the narrower versions that have played out in relation to research and textbooks, from library policies to book subsidies to, more recently, the several “open” publication models that have emerged in the higher education sector.

    From the bottom up, Shadow Libraries explores how, simply, students get the materials they need. It maps the ubiquitous practice of photocopying and what are—in many cases—the more marginal ones of buying books, visiting libraries, and downloading from unauthorized sources. It looks at the informal networks that emerge in many contexts to share materials, from face-to-face student networks to Facebook groups, and at the processes that lead to the consolidation of some of those efforts into more organized archives that circulate offline and sometimes online—the shadow libraries of the title. If Alexandra Elbakyan’s Sci-Hub is the largest of these efforts to date, the more characteristic part of her story is the prologue: the personal struggle to participate in global scientific and educational communities, and the recourse to a wide array of ad hoc strategies and networks when formal, authorized means are lacking. If Elbakyan’s story has struck a chord, it is in part because it brings this contradiction in the academic project into sharp relief—universalist in principle and unequal in practice. Shadow Libraries is a study of that tension in the digital era.

  • The Dark Rule Utilitarian Argument for Science Piracy

Here are some popular shadow libraries:

  • Anna’s Archive/SciDB is a meta index of sites that unpaywall journals and other content behind paywalls.

    📚 The largest truly open library in human history. ⭐️ We mirror Sci-Hub and LibGen. We scrape and open-source Z-Lib, DuXiu, and more. 📈 30,453,135 books, 100,357,111 papers — preserved forever. All our code and data are completely open source.

    See Wikipedia and TorrentFreak coverage of these sites

  • Sci-Hub is the pirate site that’s been the most successful in providing free access to academic papers. It may be shut down due to an Indian court case. (The geopolitics of that are fascinating!) It’s also been quasi-continued as 🧬 SciDB

  • More free online libraries.

5 The ML conference ecosystem

A lot of the ML community is organized around conferences, with a design and culture that’s radically different from traditional journals. Much has been written on it. I should probably write some more and share my own takes on the pros and cons of these conferences.

But not right now, because I’m struggling to meet some publication deadlines. That alone tells us a lot about the cycle of these conferences.

5.1 Deadlines

The main thing about ML conferences is: when’s the next one? For many years, a site called aideadlin.es was the community’s frantic, shared clock, the digital heartbeat timing the research cycle. It appears to be down now, but we keep the link here as a small monument to this vital piece of infrastructure. Its spirit lives on in successors like:

5.2 Experiments

Hugo Larochelle, in Announcing the Transactions on Machine Learning Research, describes the new journal in terms of the niche it fills, rather than assuming it’s complete in itself.

[…] we’re happy to announce that we are founding a new journal, the Transactions on Machine Learning Research (TMLR). This journal is a sister journal of the existing, well-known Journal of Machine Learning Research (JMLR), along with the Proceedings of Machine Learning Research (PMLR) and JMLR Machine Learning Open Source Software (MLOSS). However, it departs from JMLR in a few key ways, which we hope will complement our community’s publication needs. Notably, TMLR’s review process will be hosted by OpenReview, and therefore will be open and transparent to the community. Another differentiation from JMLR will be the use of double-blind reviewing, the consequence being that the submission of previously published research, even with extension, will not be allowed. Finally, we intend to work hard on establishing a fast-turnaround review process, focusing in particular on shorter-form submissions that are common at machine learning conferences.

As these are all features of conferences like NeurIPS or ICLR, we hope that TMLR will become a welcome and familiar complement to conferences for publishing machine learning research. TMLR will also depart from conferences’ review process in a few key ways.

Anytime submission Being a journal, TMLR will accept submissions throughout the year. For this, we will be implementing a rolling review process which will be executed on a per-paper timeline.

Fast turnaround We are implementing a review timeline that will provide reviews to papers within 4 weeks of submission and decisions within 2 months. To enable this, we will implement a capped workload for action editors (the equivalent of conference area chairs) and reviewers so as to remain lightweight throughout the year, while also requesting a commitment to accept all assignment requests.

Acceptance based on claims Acceptance to TMLR will avoid judgments that are based on more subjective, editorial or speculative elements of typical conference decisions, such as novelty and potential for impact. Instead, the two criteria that will drive our review process will be the answers to the following two questions:

  1. Are the claims made in the submission supported by accurate, convincing and clear evidence?
  2. Would some individuals in TMLR’s audience be interested in the findings of this paper?

The first question therefore asks that we focus the evaluation on whether claims are matched by evidence. If they are not, authors will be asked to either provide new evidence or simply adjust their claims, even if that means the implications of the work are reduced (that’s OK!). The second, though somewhat more subjective, aims at ensuring the journal features work that does contribute additional knowledge to our community. A reviewer that is unsure as to whether a submission satisfies this criterion will be asked to assume that it does.

Certifications This will be a unique feature of TMLR, which is aimed at separating editorial statements on submitted work from their claim-based scientific assessment. An accepted paper will have the opportunity of being tagged with certifications, which are distinctions meant to highlight submissions with additional merit. At launch, we will include the following certifications:

  • Outstanding Certification, for papers deemed to be of exceptionally high quality and broadly significant for the field (along the lines of a best paper award at a top-tier conference).
  • Featured Certification, for papers judged to be of very high quality, along the lines of a conference paper selected for an oral or spotlight.
  • Reproducibility Certification, for papers whose primary purpose is reproduction of other published work and that contribute significant added value through additional baselines, analysis, ablations, or insights.
  • Survey Certification, for papers that not only meet the criteria for acceptance but also provide an exceptionally thorough or insightful survey of the topic or approach.

6 Tools

Figure 2: Tom Gauld, Suggested methods of presenting your findings

See also the academic reading workflow for tips geared towards readers.

  • researchers.one: [TODO clarify]

    A platform for scholarly publishing and peer review that empowers researchers with the

    • Autonomy to pursue their passions,
    • Authority to develop and disseminate their work, and
    • Access to engage with the international community of scholars.
  • Unpaywall:

    Millions of research papers are available for free on government and university web servers, legally uploaded by the authors themselves, with the express permission of publishers. Unpaywall automatically harvests these freely shared papers from thousands of legal institutional repositories, preprint servers, and publishers, making them all available to you as you read.

  • Zenodo “is an open, dependable home for the long tail of science, enabling researchers to share and preserve research outputs of any size or format, and from any scientific discipline.”

    • Research. Shared. — all research outputs from across all fields of science are welcome!
    • Citeable. Discoverable. — uploads get a Digital Object Identifier (DOI) to make them easily and uniquely citeable…
    • Flexible licensing — because not everything is under Creative Commons.
    • Safe — your research output is stored safely for the future in the same cloud infrastructure as research data from CERN’s Large Hadron Collider.

    A major win is we can easily assign DOIs to data and code for reproducible research — for free.

  • Open Conference Systems (OCS)

    is a free Web publishing tool that will create a complete Web presence for your scholarly conference. OCS will allow you to:

    • create a conference Web site
    • compose and send a call for papers
    • electronically accept paper and abstract submissions
    • allow paper submitters to edit their work
    • post conference proceedings and papers in a searchable format
    • post, if you wish, the original data sets
    • register participants
    • integrate post-conference online discussions
  • Periodicals

    A peeriodical is a lightweight virtual journal with you as the Editor-in-chief, giving you complete freedom in setting editorial policy to select the most interesting and useful manuscripts for your readers.

    I didn’t find that explanation as useful as the interview the creators gave.

  • The Winnower

    is an open access online scholarly publishing platform that employs open post-publication peer review. You guessed it! We think transparency from start to finish is critical in scientific communication. […]

  • Retraction Watch is a watchdog blog that, for high-profile enough research, has somehow become a well-regarded source of gatekeeping and exposure.

7 Peer review

See Peer review.

8 Open access

Figure 3

Various open-access (and occasionally open-source) journals try to disrupt incumbent publishers with business models built on the low cost of online infrastructure. As with legacy journals, they have varying degrees of success.

One cute boutique example:

Open Journals

Open Journals is a collection of open source, open access journals. We currently have four main publications:

All of our journals run on open source software, which is available under our GitHub organization profile: github.com/openjournals.

Creative Commons Licence All of our journals are open access publications with content licensed under a Creative Commons Attribution 4.0 International License. Copyright remains with the submitting authors.

9 Handy software

Want to run our own journal?

  • PubPub recently open-sourced itself after vanishing for a while.

    PubPub is a free and open tool for collaborative editing, instant publishing, continuous review, and grassroots journals.

    Looks like some kind of TypeScript + Firebase stack? Not documented well enough for me to tell at a glance.

  • Janeway (Python stack) seems pretty popular; see documentation and source: openlibhums/janeway

  • Open Journals has leaned all the way into the GitHub ecosystem, e.g. Reviews happen via GitHub Issues.. There’s still a bunch of Ruby code and a database involved for some reason; see openjournals/joss for an example of the stack in action.

  • Kotahi sounds modern, although I got some worrying sign that they might not be super stable. In the 24 hours between first hearing about Kotahi and writing this, their site went down, and all landing page information seemed to vanish into oblivion. They are now back. The source code is notionally available but it’s a real mess: there’s no single visible canonical repository, just many odd forks (some specializing in, e.g., bat virus spillover research). (JS stack)

  • The venerable Open Journal Systems (PHP stack) is well-tested and widely used. See their site. The most famous OJS-based journal I’m aware of is the Journal of Statistical Software.

10 Incoming

  • The Strain on Scientific Publishing (Hanson et al. 2024)

    Scientists are increasingly overwhelmed by the volume of articles being published. The total number of articles indexed in Scopus and Web of Science has grown exponentially in recent years; in 2022 the article total was ∼47% higher than in 2016, which has outpaced the limited growth—if any—in the number of practicing scientists. Thus, publication workload per scientist has increased dramatically. We define this problem as “the strain on scientific publishing.” To analyze this strain, we present five data-driven metrics showing publisher growth, processing times, and citation behaviors. We draw these data from web scrapes, and from publishers through their websites or upon request. Specific groups have disproportionately grown in their articles published per year, contributing to this strain. Some publishers enabled this growth by hosting “special issues” with reduced turnaround times. Given pressures on researchers to “publish or perish” to compete for funding, this strain was likely amplified by these offers to publish more articles. We also observed widespread year-over-year inflation of journal impact factors coinciding with this strain, which risks confusing quality signals. Such exponential growth cannot be sustained. The metrics we define here should enable this evolving conversation to reach actionable solutions to address the strain on scientific publishing.

Content: * Scientific Publishing: Enough is Enough – by Seemay Chou * What methods work for evaluating the impact of public investments in RD&I * Is Frontiers predatory?

tl;dr: The community perceives Frontiers’ review process as suspect, and the journal’s impact factor is generally decreasing.

11 References

Aczel, Szaszi, and Holcombe. 2021. A Billion-Dollar Donation: Estimating the Cost of Researchers’ Time Spent on Peer Review.” Research Integrity and Peer Review.
Afonso. 2014. How Academia Resembles a Drug Gang.” SSRN Scholarly Paper.
Björk, and Solomon. 2013. The Publishing Delay in Scholarly Peer-Reviewed Journals.” Journal of Informetrics.
Bogich, Balleseteros, Berjon, et al. n.d. On the Marginal Cost of Scholarly Communication.”
Hanson, Barreiro, Crosetto, et al. 2024. The Strain on Scientific Publishing.” Quantitative Science Studies.
Heckman, and Moktan. 2020. Publishing and Promotion in Economics: The Tyranny of the Top Five.” Journal of Economic Literature.
Himmelstein, Rubinetti, Slochower, et al. 2019. Open Collaborative Writing with Manubot.” Edited by Dina Schneidman-Duhovny. PLOS Computational Biology.
Ioannidis, Klavans, and Boyack. 2018. Thousands of Scientists Publish a Paper Every Five Days.” Nature.
Karaganis, ed. 2018. Shadow Libraries: Access to Knowledge in Global Higher Education. International Development Research Centre.
Keller. 2024. “On the Difference Between Conferences and Journals in Artifical Intelligence and Computer Security.”
Krikorian, and Kapczynski. 2010. Access to knowledge in the age of intellectual property.
Pensky, Richardson, Serrano, et al. 2021. Disrupt and Demystify the Unwritten Rules of Graduate School.” Nature Geoscience.
Potts, Hartley, Montgomery, et al. 2016. A Journal Is a Club: A New Economic Model for Scholarly Publishing.” SSRN Scholarly Paper.
Schimmer, Ralf, Geschuhn, Kai Karin, and Vogler, Andreas. 2015. Disrupting the subscription journals’ business model for the necessary large-scale transformation to open access.”
Sever. 2023. Biomedical Publishing: Past Historic, Present Continuous, Future Conditional.” PLOS Biology.
van Noorden. 2013. Open Access: The True Cost of Science Publishing.” Nature.
Wagenmakers, Sarafoglou, and Aczel. 2022. One Statistical Analysis Must Not Rule Them All.” Nature.