Disseminating science
Journals and preprint servers etc
2015-07-07 — 2025-12-11
Wherein the mechanics of scholarly communication are surveyed, with pirate shadow libraries described as sustaining access in some regions, ML conference deadlines like aideadlin.es described as structuring research timing, and DOIs assigned via Zenodo.
Some notes on publishing science. Publication connects reproducibility, scholarly discovery, intellectual property, peer-review, academic business models, and such.
1 Economics of publishing
To some extent, the production of academic knowledge is a public goods problem. Because academic publishing is part of that production, it inherits those oddities.
Cameron Neylon has built a cottage industry producing pragmatic critiques of publishing from an institutional economics perspective:
e.g. The Marginal Costs of Article Publishing or A Journal is a Club:
We’d been talking about communities, cultures, economics, “public-making,” but it was the word ‘club’ and its associated concepts, both pejorative and positive, that crystallized everything. We were talking about the clubbishness of making knowledge — the term “Knowledge Clubs” emerged quickly — but also the benefits that such a club might gain in choosing to invest in wider sharing.
Working paper: Potts et al. (2016). Alternatively, see Afonso (2014): “How academia resembles a drug gang”.
In the business setting, this often leads incumbent publishers to a kind of spluttering defense of the value they create while simultaneously complaining that the customer doesn’t appreciate their work. Flip the target slightly, and we’d call this “missing the new market opportunity” or “failing to express the value offering clearly.” […]
Lingua, […] has gone from one of the most important journals in analytical linguistics to no longer being in the field and seems well on its way to becoming irrelevant. How does a company as competent in its business strategy as Elsevier let this happen? I would argue, as I did at the time that the former editorial board of Lingua resigned to form Glossa, that it was a failure to understand the assets.
The neoliberal analysis of Lingua showed an asset generating good revenues, with good analytics and a positive ROI. The capitalist analysis focused on the fixed assets and trademarks. But it turns out these weren’t what was creating value. What was creating value was the community, built around an editorial board and the goodwill associated with that.
Also, see Pushing costs downstream.
Here’s something I’d like phrased a bit better, but I think it’s important: An Adversarial Review of “Adversarial Generation of Natural Language”: The argument is that, even though it’s nice that arXiv avoids some problems of traditional publishing, it inherits others that traditional publishing tries to avoid. No free lunches.
2 Publishing as performance indicator
The classic academic problem. Journal rank and journal impact factors, etc. We all know and copiously complain that this system is broken: it wastes money, heavily subsidizes private publishers, and does little to ensure quality compared with the price we pay.
But until — or unless — we get so famous we have nothing to prove, those of us aspiring to academic careers need to play the game. Funders care, against our advice, but whatever — they have the money, so we need to care if we want them to keep funding us.
Latrobe explains it. Scimago Journal rank is the Google PageRank-inspired, slightly hipper journal ranking. Their search tool is probably what you want. Impact factors come from the ’60s and are still around, h-Index is also a thing. journalrank might be a factor too?
According to Latrobe, here are some indices and a partial list of their weaknesses.
2.1 h-Index
Hirsch index: The number of articles in a journal [h] that have received at least [h] citations over a citation period.
Weaknesses:
- Editors can manipulate by requiring contributors to add citations from their journals
- Increases with age so bias towards researchers with long publication records
2.2 JIF
Journal Impact Factor: Citations to a journal in the JCR year to items published in the previous two years, divided by the total number of citable items (articles and reviews) published in the journal in the previous two years.
Weaknesses:
- Limited to journals within Web of Science
- Cannot be used to compare journals across different subject categories
2.3 SJR
SCImago Journal Rank: Average number of weighted citations received in a year, by articles published in a journal in the previous 3 years.
Weaknesses are that it’s “complicated” and that the numbers are small.
So I guess if we must do a journal ranking, this is the least bad method?
2.4 CORE/ICORE conference ratings
The ICORE rankings rank conferences by some notion of importance within fields. For example, in my workplace we can expect our travel to be only to “A*” conferences, which optimizes the organization’s prestige and helps defend our funding. Are they the best conferences to attend in terms of outcomes? I don’t know.
3 Other methods
Conference Ranks aggregates many measures at once.
4 Shadow libraries
Shadow libraries are “online databases of readily available content that is normally obscured or otherwise not readily accessible. Such content may be inaccessible for a number of reasons, including the use of paywalls, copyright controls, or other barriers to accessibility placed upon the content by its original owners.”
The biggest phenomenon in open access, as far as I can tell, is the massive pirate infrastructure that makes journals freely available. See also Copyright activism.
Anecdotally, for example, no work in Indonesian academia would be possible without access to shadow libraries. They seem to be legal in some jurisdictions, but not in others. Check your local laws before accessing them. For some speculation and developments in the legality of shadow libraries, see
Jonathan Basile’s essay on AAARG, Who’s Afraid of AAARG?.
Karaganis (2018):
From the top down, Shadow Libraries explores the institutions that shape the provision of educational materials, from the formal sector of universities and publishers to the broadly informal ones organized by faculty, copy shops, student unions, and students themselves. It looks at the history of policy battles over access to education in the post—World War II era and at the narrower versions that have played out in relation to research and textbooks, from library policies to book subsidies to, more recently, the several “open” publication models that have emerged in the higher education sector.
From the bottom up, Shadow Libraries explores how, simply, students get the materials they need. It maps the ubiquitous practice of photocopying and what are—in many cases—the more marginal ones of buying books, visiting libraries, and downloading from unauthorized sources. It looks at the informal networks that emerge in many contexts to share materials, from face-to-face student networks to Facebook groups, and at the processes that lead to the consolidation of some of those efforts into more organized archives that circulate offline and sometimes online—the shadow libraries of the title. If Alexandra Elbakyan’s Sci-Hub is the largest of these efforts to date, the more characteristic part of her story is the prologue: the personal struggle to participate in global scientific and educational communities, and the recourse to a wide array of ad hoc strategies and networks when formal, authorized means are lacking. If Elbakyan’s story has struck a chord, it is in part because it brings this contradiction in the academic project into sharp relief—universalist in principle and unequal in practice. Shadow Libraries is a study of that tension in the digital era.
Here are some popular shadow libraries:
Anna’s Archive/SciDB is a meta-index of sites that unpaywall journals and other paywalled content.
📚 The largest truly open library in human history. ⭐️ We mirror Sci-Hub and LibGen. We scrape and open-source Z-Lib, DuXiu, and more. 📈 30,453,135 books, 100,357,111 papers — preserved forever. All our code and data are completely open source.
See wikipedia and TorrentFreak coverage of these institutions
scihub is the pirate site that has been the most successful in providing free access to academic papers. It may be shut down due to an Indian court case. (The geopolitics of this are fascinating!) It has also been quasi-continued as 🧬 SciDB
5 The ML conference ecosystem
A lot of the ML community is organized around conferences, with a radically different design and culture than traditional journals. Much has been written on it. I should probably write some more and express my own personal takes on the pros and cons of these conferences.
But not right now, because I am struggling to meet some publication deadlines. That in itself tells us a lot about the cycle of such conferences
5.1 Deadlines
The main thing about ML conferences is when is the next one? For many years, a site called aideadlin.es was the community’s frantic, shared clock, the digital heartbeat timing the research cycle. It appears to be down now, but we keep the link here as a small monument to this vital piece of infrastructure. Its spirit lives on in successors like:
5.2 Experiments
Hugo Larochelle, in Announcing the Transactions on Machine Learning Research, describes the new journal in terms of the niche it fills, rather than assuming it’s complete in itself.
[…] we’re happy to announce that we are founding a new journal, the Transactions on Machine Learning Research (TMLR). This journal is a sister journal of the existing, well-known Journal of Machine Learning Research (JMLR), along with the Proceedings of Machine Learning Research (PMLR) and JMLR Machine Learning Open Source Software (MLOSS). However, it departs from JMLR in a few key ways, which we hope will complement our community’s publication needs. Notably, TMLR’s review process will be hosted by OpenReview, and therefore will be open and transparent to the community. Another differentiation from JMLR will be the use of double-blind reviewing, the consequence being that the submission of previously published research, even with extension, will not be allowed. Finally, we intend to work hard on establishing a fast-turnaround review process, focusing in particular on shorter-form submissions that are common at machine learning conferences.
As these are all features of conferences like NeurIPS or ICLR, we hope that TMLR will become a welcome and familiar complement to conferences for publishing machine learning research. TMLR will also depart from conferences’ review process in a few key ways.
Anytime submission Being a journal, TMLR will accept submissions throughout the year. For this, we will be implementing a rolling review process which will be executed on a per-paper timeline.
Fast turnaround We are implementing a review timeline that will provide reviews to papers within 4 weeks of submission and decisions within 2 months. To enable this, we will implement a capped workload for action editors (the equivalent of conference area chairs) and reviewers so as to remain lightweight throughout the year, while also requesting a commitment to accept all assignment requests.
Acceptance based on claims Acceptance to TMLR will avoid judgments that are based on more subjective, editorial or speculative elements of typical conference decisions, such as novelty and potential for impact. Instead, the two criteria that will drive our review process will be the answers to the following two questions:
- Are the claims made in the submission supported by accurate, convincing and clear evidence?
- Would some individuals in TMLR’s audience be interested in the findings of this paper?
The first question therefore asks that we focus the evaluation on whether claims are matched by evidence. If they are not, authors will be asked to either provide new evidence or simply adjust their claims, even if that means the implications of the work are reduced (that’s OK!). The second, though somewhat more subjective, aims at ensuring the journal features work that does contribute additional knowledge to our community. A reviewer that is unsure as to whether a submission satisfies this criterion will be asked to assume that it does.
Certifications This will be a unique feature of TMLR, which is aimed at separating editorial statements on submitted work from their claim-based scientific assessment. An accepted paper will have the opportunity of being tagged with certifications, which are distinctions meant to highlight submissions with additional merit. At launch, we will include the following certifications:
- Outstanding Certification, for papers deemed to be of exceptionally high quality and broadly significant for the field (along the lines of a best paper award at a top-tier conference).
- Featured Certification, for papers judged to be of very high quality, along the lines of a conference paper selected for an oral or spotlight.
- Reproducibility Certification, for papers whose primary purpose is reproduction of other published work and that contribute significant added value through additional baselines, analysis, ablations, or insights.
- Survey Certification, for papers that not only meet the criteria for acceptance but also provide an exceptionally thorough or insightful survey of the topic or approach.
6 Tools
See also academic reading workflow for tips geared towards readers.
-
A platform for scholarly publishing and peer review that empowers researchers with the
- Autonomy to pursue their passions,
- Authority to develop and disseminate their work, and
- Access to engage with the international community of scholars.
-
Millions of research papers are available for free on government and university web servers, legally uploaded by the authors themselves, with the express permission of publishers. Unpaywall automatically harvests these freely shared papers from thousands of legal institutional repositories, preprint servers, and publishers, making them all available to you as you read.
Zenodo “is an open, dependable home for the long tail of science, enabling researchers to share and preserve research outputs of any size or format, and from any scientific discipline.”
- Research. Shared. — all research outputs from across all fields of science are welcome!
- Citeable. Discoverable. — uploads get a Digital Object Identifier (DOI) to make them easily and uniquely citeable…
- Flexible licensing — because not everything is under Creative Commons.
- Safe — your research output is stored safely for the future in the same cloud infrastructure as research data from CERN’s Large Hadron Collider.
A major win is that we can easily assign DOIs to data and code for reproducible research — for free.
-
is a free Web publishing tool that will create a complete Web presence for your scholarly conference. OCS will allow you to:
- create a conference Web site
- compose and send a call for papers
- electronically accept paper and abstract submissions
- allow paper submitters to edit their work
- post conference proceedings and papers in a searchable format
- post, if you wish, the original data sets
- register participants
- integrate post-conference online discussions
-
A peeriodical is a lightweight virtual journal with you as the Editor-in-chief, giving you complete freedom in setting editorial policy to select the most interesting and useful manuscripts for your readers.
I didn’t find that explanation as useful as the interview the creators gave.
-
is an open access online scholarly publishing platform that employs open post-publication peer review. You guessed it! We think transparency from start to finish is critical in scientific communication. […]
Retraction Watch is a watchdog blog that, for sufficiently high-profile research, has somehow become a well-regarded source of gatekeeping and exposure.
7 Peer review
See Peer review.
8 Open access
Various open-access (and occasionally open-source) journals try to disrupt incumbent publishers with business models built around the low cost of online infrastructure. As with legacy journals, they have varying degrees of success.
One cute, boutique example:
Open Journals is a collection of open source, open access journals. We currently have four main publications:
- The Journal of Open Source Software
- The Journal of Open Source Education
- The Open Journal of Astrophysics
- The Journal of Brief Ideas
All of our journals run on open source software, which is available under our GitHub organization profile: github.com/openjournals.
All of our journals are open access publications with content licensed under a Creative Commons Attribution 4.0 International License. Copyright remains with the submitting authors.
9 Handy software
Want to run your own journal?
- Janeway (Python stack) seems popular; see documentation and source: openlibhums/janeway
- Kotahi is an option. They sound modern, but maybe in the sense of ephemeral? In the 24 hours between my first hearing about them and writing this, their site went down, and all landing page information seemed to vanish into oblivion. The source code is notionally available but it’s a real mess: there’s no single visible canonical repository, just many odd forks (some specializing in, e.g., bat virus spillover research). (JS stack) We should probably regard this as unsupported.
- The venerable Open Journal Systems (PHP stack) is well-tested and widely used. See their site. The most famous OJS-based journal I’m aware of is the Journal of Statistical Software
10 Incoming
The Strain on Scientific Publishing (Hanson et al. 2024)
Scientists are increasingly overwhelmed by the volume of articles being published. The total number of articles indexed in Scopus and Web of Science has grown exponentially in recent years; in 2022 the article total was ∼47% higher than in 2016, which has outpaced the limited growth—if any—in the number of practicing scientists. Thus, publication workload per scientist has increased dramatically. We define this problem as “the strain on scientific publishing.” To analyze this strain, we present five data-driven metrics showing publisher growth, processing times, and citation behaviors. We draw these data from web scrapes, and from publishers through their websites or upon request. Specific groups have disproportionately grown in their articles published per year, contributing to this strain. Some publishers enabled this growth by hosting “special issues” with reduced turnaround times. Given pressures on researchers to “publish or perish” to compete for funding, this strain was likely amplified by these offers to publish more articles. We also observed widespread year-over-year inflation of journal impact factors coinciding with this strain, which risks confusing quality signals. Such exponential growth cannot be sustained. The metrics we define here should enable this evolving conversation to reach actionable solutions to address the strain on scientific publishing.
What methods work for evaluating the impact of public investments in RD&I
Is Frontiers predatory?
- Is Frontiers a potential predatory publisher? – For Better Science
- Predatory reports: Is Frontiers Media a Predatory Publisher?
tl;dr: The community perceives their review process as suspect, and the journal’s impact factor is generally decreasing.
PNAS is Not a Good Journal (and Other Hard Truths about Journal Prestige). At this point, most people in science realize that journals are behaving more parasitically. They’re sustained by reputational capital from self-fulfilling prophecies of prestige, but they’re burning through that capital. The most generous interpretation of academia’s response is that we’re too busy trying not to buttress public trust in science to reform these undeservedly respected institutions. [TODO clarify] The least generous is that we are symbiotic parasites together with the journals, using them for our own reputations at the cost of public trust in science. In between is the story that we’re all too busy meeting deadlines to solve the collective action problem of boycotting journals.
Identify trusted publishers for your research • Think. Check. Submit.
“Journal Evaluation Tool” by Shilpa Rele, Marie Kennedy et al.
Felix Schönbrodt, My personal reviewing policy: No more billion-dollar donations.
Étienne Fortier-Dubois, Why Is ‘Nature’ Prestigious?
Time for a Change: How Scientific Publishing is Changing For The Better



