Validating and reproducing science

The systems that grind the chaos of noise into good results. Peer review, academic incentives, credentials, publishing…

2020-05-16 — 2025-09-30

Wherein the mechanisms for validating scientific claims are catalogued, the replication crisis and publication bias are examined, and reforms such as pre‑registration and registered reports are outlined.

academe

agents

collective knowledge

economics

faster pussycat

game theory

how do science

incentive mechanisms

institutions

mind

networks

provenance

sociology

wonk

Empirical knowledge generation for universals, i.e.science, is a complicated, multifaceted socio-technical process. For the sake of tractability, we can model it in three phases: the generation of hypotheses, the dissemination of findings, and the validation of those findings. This post focuses on the validation: How does the scientific community assess, critique, and falsify scientific claims?

That is to ask, how do we, collectively, know what is true? This question has long been the domain of the history and philosophy of science, which examines how scientific knowledge is justified and progresses—whether through steady accumulation, rigorous falsification (Popper), or dramatic paradigm shifts (Kuhn). In practice, the modern scientific enterprise relies on a complex socio-technical machinery to separate signal from noise. This machinery includes peer review, funding bodies, reputation systems, and incentive mechanisms.

There are maaaaaany challenges we need to solve to align people with truth-seeking, which may not come naturally to us. These include reputation systems, collective decision making, groupthink management and other mechanisms for trustworthy science — a.k.a. our collective knowledge of reality itself. Because I’m a kind of meta-nerd, I think about this as a problem of epistemic community design, where we try to structure a community with the norms, conventions and economic incentives that lead to the best possible collective knowledge outcomes.

Ideally, this system should function as a filter, ensuring that only robust, reproducible findings enter, persist, and propagate in the canon of collective knowledge, while falsities are weeded out.

It’s not at all clear how well this works in the world as it is. Science loves studying itself, but it’s not great at overcoming inertia to self-correct its institutional failings. The “Replication Crisis”—the realization that many published findings, particularly in psychology, medicine, and economics, cannot be independently reproduced—has exposed flaws in how science validates itself.

This post sketches the landscape of scientific validation. We examine the institutions designed to ensure trustworthy science, explore why they fail (addressing issues like P-hacking and publication bias), and look at proposed reforms. The goal is to understand the gap between the ideal of truth-seeking and the reality of modern academic incentives.

1 The Stakes are Public Trust

Let’s anchor on the ideal that most scientists seem to have in mind — the Mertonian norms:

The four Mertonian norms (often abbreviated as the CUDO-norms) can be summarised as:

communism: all scientists should have common ownership of scientific goods (intellectual property), to promote collective collaboration; secrecy is the opposite of this norm.

universalism: scientific validity is independent of the sociopolitical status/personal attributes of its participants.

disinterestedness: scientific institutions act for the benefit of a common scientific enterprise, rather than for specific outcomes or the resulting personal gain of individuals within them.

organized skepticism: scientific claims should be exposed to critical scrutiny before being accepted: both in methodology and institutional codes of conduct.

The integrity of scientific validation is not purely an internal academic concern, furnishing the ivory towers. Science aims to inform critical decisions in public policy, healthcare, and technology, and to inform the public. This role implies a social contract: society provides funding and autonomy in exchange for reliable, vetted knowledge.

However, this trust isn’t automatic or unconditional; it must be continuously earned — or fought for. When the validation machinery fails—when findings are irreproducible, fraud occurs, or the literature is distorted by bias—it undermines the credibility of the entire enterprise. The erosion of trust is often a rational response to observed institutional behaviour, rather than simply public ignorance.

1.1 The Rational Basis for Skepticism

It is essential to acknowledge that public skepticism toward scientific institutions is often well-founded, rooted in historical and ongoing failures.

The influence of corporate interests on research outcomes is a significant, salient concern. As claimed by critics like Goldacre (2012), the pharmaceutical industry often controls the design, execution, and publication of research on its own products. This leads to pervasive publication bias, where negative results are suppressed, distorting the evidence base.

At the moment, people often point to the opioid crisis. Pharmaceutical companies deceptively marketed opioids as non-addictive, citing flawed or misrepresented literature. This demonstrates how industry interests can corrupt the validation process with catastrophic societal consequences. As historians Oreskes and Conway argued in Oreskes and Conway (2022), the strategy of manufacturing uncertainty to delay regulation—first perfected by the tobacco industry—remains a standard playbook for industries facing inconvenient scientific truths.

Historical abuses have also left scars, particularly among marginalized communities. For example, the infamous “Tuskegee Study of Untreated Syphilis” (1932–1972), where U.S. Public Health Service researchers withheld known treatments from Black men, remains a symbol of institutional betrayal. Research suggests the disclosure of this study contributed to lasting medical mistrust and measurable negative health outcomes (Alsan and Wanamaker 2018). (cf Nazi experiments, Henrietta Lacks, etc.)

These examples highlight that skepticism is often directed not at the scientific method itself, but at the institutions charged with implementing it.

1.2 Failures of Science Communication

The credibility problem is exacerbated by the way science is communicated. The incentives driving academic careers (novelty, impact, funding) often bleed into communication strategies, prioritizing hype over substance.

This is reinforced by the internal dynamics of science itself. A recent study found that papers that failed to replicate are cited significantly more often than those that succeeded (Serra-Garcia and Gneezy 2021). This suggests the system favours “interesting” or surprising results, regardless of their validity.

This tendency toward novelty creates a “hype pipeline” (Caulfield and Condit 2012). Academic pressures lead to exaggerated claims in papers, university press offices amplify those claims to attract media attention, and journalists prioritize sensational findings. Studies have shown that press releases frequently exaggerate the importance of findings while minimizing limitations (Sumner et al. 2014). This constant stream of “breakthroughs” sets the public up for disillusionment.

Furthermore, institutions sometimes prioritize messaging that encourages desired public behaviour over complete transparency. As sociologist Zeynep Tufekci argued, when public health authorities appear opaque or misleading (as seen in debates during the COVID-19 pandemic), it damages institutional credibility. Paternalistic messaging often backfires, as the public recognize inconsistency and interprets it as incompetence or deception.

1.3 Funding pressure

TBD — I really need to write this one up. It’s of general interest; one day I’ll be permitted to mention my personal experiences with industry pressure.

1.4 Do Your Own Research

In this environment of justified skepticism and communication failures, the rise of the “Do Your Own Research” (DYOR) phenomenon is understandable. It reflects a collapse of faith in established mediating institutions.

While the impulse behind DYOR frequently stems from legitimate critiques—the recognition that published research can be biased and that experts can be conflicted—the method doesn’t fix the institutional failures of mainstream science.

DYOR mistakes access to information (e.g., reading preprints or isolated studies) for the rigorous, collective process required to evaluate and synthesize it. An individual cannot solve the problems of P-hacking or publication bias alone; these require institutional reform.

Indeed, someone close to me died needlessly and slowly because they trusted their own research over medical advice. This shit is serious. At the same time, other people close to me have been ignored or shunned for questioning established narratives when they have direct evidence of adverse effects from medical treatments.

As researcher danah boyd argues, we are living through a crisis of epistemology—a disagreement about how we know whether something is true. In the digital age, DYOR often involves navigating an information landscape strategically polluted by media manipulation. Boyd warns that simplistic approaches to critical thinking—teaching people to doubt without providing robust frameworks for understanding—can backfire, making individuals more vulnerable to conspiratorial thinking.

The DYOR movement seems great at surfacing flaws in existing practice. It’s less good at producing calibrated criticism and synthesis, and it doesn’t offer a scalable mechanism for reliable scientific self-correction. It often leads individuals into alternative epistemic communities that lack rigorous validation processes.

The challenge for the scientific community is not to demand blind trust, but to earn it by reforming the internal mechanisms of validation and communication, making the enterprise demonstrably rigorous, transparent, and accountable.

2 The Machinery of Validation

The primary mechanism for validating scientific claims today is peer review, organized by academic journals and funding bodies.

2.1 The Evolution and Limits of Peer Review

It is often assumed that peer review is a timeless feature of the scientific process, but its current dominance is a relatively recent invention.

Baldwin (2018) notes:

Throughout the nineteenth century and into much of the twentieth, external referee reports were considered an optional part of journal editing or grant making. The idea that refereeing is a requirement for scientific legitimacy seems to have arisen first in the Cold War United States. In the 1970s, in the wake of a series of attacks on scientific funding, American scientists faced a dilemma: there was increasing pressure for science to be accountable to those who funded it, but scientists wanted to ensure their continuing influence over funding decisions. Scientists and their supporters cast expert refereeing—or “peer review,” as it was increasingly called—as the crucial process that ensured the credibility of science as a whole.

This history suggests peer review evolved as much to maintain professional autonomy as to ensure empirical rigour. Today its effectiveness is hotly debated: critics say it’s slow, biased, and often fails to detect errors. Adam Mastroianni argues in The rise and fall of peer review that the system often fails to deliver on its promises.

For a deeper dive, see peer review.

Related question: How do we discover research to send for peer review?

Figure 2: This post by Julia Rohrer is a deep cut if you are the right type of nerd

Figure 3: Julia Rohrer’s Empty Theory of Empty Theories

2.2 Gatekeeping and Institutional Culture

The validation process inherently involves gatekeeping. Institutions decide who can participate and which standards must be met. That helps explain why we’re perennially dissatisfied with academic culture.

Thomas Basbøll observes the tension between innovation and tradition:

It is commonplace today to talk about “knowledge production” and the university as a site of innovation. But the institution was never designed to “produce” something nor even to be especially innovative. Its function was to conserve what we know. It just happens to be in the nature of knowledge that it cannot be conserved if it does not grow.

This conservatism can make the validation process hostile to radically new ideas. (See Andrew Marzoni, Academia is a cult).

Friction in the traditional system leads some researchers to bypass that system entirely. Stephen Wolfram’s recent approach provides a telling example, as detailed by Adam Becker (Wolfram’s latest positioning):

So why did Wolfram announce his ideas this way? Why not go the traditional route? “I don’t really believe in anonymous peer review,” he says. “I think it’s corrupt. It’s all a giant story of somewhat corrupt gaming, I would say. I think it’s sort of inevitable that happens with these very large systems. It’s a pity”.

2.3 Grants (TODO)

How does the funding mechanism shape what gets validated? Does the grant review process prioritize novelty over rigour?

3 Systemic Failures: The Replication Crisis

The replication crisis is the clearest sign that the validation machinery is broken. If peer review and statistical significance testing were working correctly, we’d expect most published findings to be robust. That hasn’t been the case.

Figure 4: Stuart Ritchie’s new book is summarised in comic-book form by Zach Wienersmith.

Oliver Traldi reviews Stuart Ritchie’s book, Science Fictions, which outlines the crisis, particularly in social psychology:

Serious though this is, there is also something more specifically pernicious about the replication crisis in psychology. We saw that the bias in psychological research is in favour of publishing exciting results. An exciting result in psychology is one that tells us that something has a large effect on people’s behaviour. […] Think of the studies I mentioned above: a mess makes people more prejudiced; a random assignment of roles makes people sadistic; a list of words makes people walk at a different speed; a strange pose makes people more confident. And so on.

This crisis has structural causes: statistical malpractice and misaligned incentives.

See also: Sanjay Srivastava, Everything is fucked, the syllabus.

3.1 Publication Bias and the File Drawer

Journals prioritize novel, statistically significant findings. This creates a “publication sieve”.

Null results, or studies that fail to find an effect, are often relegated to the “file drawer”—they’re never published.

This leads to a distorted view of reality in the published literature. If 20 researchers each test a hypothesis, and only the one who (by chance) achieves \(P\leq 0.05\) publishes their result, the literature will suggest a strong effect where none exists. This is essentially multiple testing across an entire scientific field.

3.2 P-Hacking and Researcher Degrees of Freedom

For individual researchers, the pressure to publish significant results incentivizes “researcher degrees of freedom”—the many choices researchers make during data analysis that can be exploited (consciously or unconsciously) to push a result below the significance threshold. This is often called P-hacking.

Figure 6: We’re out here every day, doing the dirty work finding noise and then polishing it into the hypotheses everyone loves. It’s not easy. —John Schmidt, The noise miners

We can detect the signature of systematic p-hacking in the literature. See Uri Simonsohn’s work on the p-curve.

4 The Incentive Landscape

Why do these bad practices persist? Often they’re the rational response to the incentive structure of academic careers. Success (tenure, grants, acclaim) is often judged by the quantity and impact of publications, not by their rigour or reproducibility.

As Ed Hagen argues, academic success is either a crapshoot or a scam. The core problem is that rigorous science is slow and often produces null results, while academic careers push us to produce frequent, high-impact papers.

The problem, in a nutshell, is that empirical researchers have placed the fates of their careers in the hands of nature instead of themselves. […]

the minimum acceptable number of pubs per year for a researcher with aspirations for tenure and promotion is about three. This means that, each year, I must discover three important new things about the world. […]

Let’s say I choose to run 3 studies that each has a 50% chance of getting a sexy result. If I run 3 great studies, mother nature will reward me with 3 sexy results only 12.5% of the time. I would have to run 9 studies to have about a 90% chance that at least 3 would be sexy enough to publish in a prestigious journal.

I do not have the time or money to run 9 new studies every year.

This pressure creates a “natural selection of bad science”. Cheap methods that are more likely to produce publishable — that is, statistically significant — results thrive, regardless of their validity.

Julia Rohrer elaborates on how practices thrive in today’s ecosystem:

So, when does a certain practice […] “succeed” and start to dominate journals?

It must be capable of surviving a multi-stage selection procedure:

Implementation must be sufficiently affordable so that researchers can actually give it a shot

Once the authors have added it to a manuscript, it must be retained until submission

The resulting manuscript must enter the peer-review process and survive it (without the implementation of the practice getting dropped on the way)

The resulting publication needs to attract enough attention post-publication so that readers will feel inspired to implement it themselves, fueling the eternally turning wheel of ~~Samsara~~ publication-oriented science

Furthermore, Smaldino and O’Connor (2020) highlights how self-preferential biases can shelter poor methodology within scientific communities, making it hard for better methods to spread.

5 Reforming the System

In response to these systemic failures, we’ve seen various reforms proposed to improve the reliability of science. We can group them into methodological and structural reforms.

5.1 Transparency and Pre-registration

A major push is toward greater transparency to reduce researchers’ degrees of freedom. This includes open notebook science.

A key intervention is pre-registration, in which researchers specify their hypotheses, methods, and analysis plan before collecting data.

Tom Stafford’s 2-minute guide to experiment pre-registration explains the options:

you can use the Open Science Framework to timestamp and archive a pre-registration, so you can prove you made a prediction ahead of time.

you can visit AsPredicted.org which provides a form to complete…

“Registered Reports”: more and more journals are committing to published pre-registered studies. They review the method and analysis plan before data collection and agree to publish once the results are in (however they turn out).

Registered reports are particularly useful for decoupling the publication decision from the results, potentially solving the incentive problem. However, they require more upfront work and may slow down the research process, which can clash with existing career incentives.

5.2 Review

5.2.1 Post-Publication Review

Traditional peer review is often slow, opaque, and unreliable. We’re seeing a shift toward post-publication review (PPPR), moving validation from a single gatekeeping event to a continuous process.

Andrew Gelman argues that post-publication review should be much more efficient than the current system. He also notes that the adversarial nature of traditional review trains scientists to be defensive rather than open to critique.

Platforms like PubPeer facilitate this by providing a forum for public commentary and critique of published papers. This system has been implicated in several high-profile retractions, demonstrating the power of decentralized scrutiny. Independent projects like Data Colada also play important roles in post-publication review, often uncovering errors and misconduct that traditional peer review misses. How sustainable is that, though? How much are the people behind these projects rewarded for their work? And how much of their time do they spend fending off lawsuits over their work?

The story behind the “Data falsificada” scandal is super juicy, by the way. See also

5.2.2 Institutional and Funding Reform

If the root cause is the incentive structure, the ultimate solution must involve changing how science is funded and how careers are evaluated. We’re seeing growing interest in structures that explicitly prioritize rigour over high-impact publications.

6 Emerging Tools (TODO)

Discuss potential roles for AI/ML in validation, e.g., The Black Spatula Project (verifying scientific papers using LLMs).

7 Incoming

Why Science is Facing a Credibility Crisis — DeSci Foundation
Agent-Based Modeling in the Philosophy of Science (Stanford Encyclopedia of Philosophy)
Social Epistemology (Stanford Encyclopedia of Philosophy)
‘Dr’ Trump, Tylenol, and why Americans don’t trust the ‘science’ anymore | World News—The Times of India
CS Paper Reviews — a machine to review your paper and increase your odds of acceptance
Short Pieces on Reviewing - emnlp 2022
Saloni Dattani, Real peer review has never been tried
Matt Clancy, What does peer review know?
Adam Mastroianni, The rise and fall of peer review
The Myth of the Expert Reviewer
Étienne Fortier-Dubois, Why Is ‘Nature’ Prestigious?
Science and the Dumpster Fire | Elements of Evolutionary Anthropology
F1000Research | Open Access Publishing Platform | Beyond a Research Journal

F1000Research is an Open Research publishing platform for scientists, scholars and clinicians offering rapid publication of articles and other research outputs without editorial bias. All articles benefit from transparent peer review and editorial guidance on making all source data openly available.
Reviewing is a Contract - Rieck on the social expectations around reviewing and chairing.
The Rogue Scholar: An Archive for Scholarly blogs
Reformation in the Church of Science
Jocelynn Pearl proposes some fun ideas — including blockchain-y ones — in Time for a Change: How Scientific Publishing is Changing For The Better.
Protocol and plan for the development of the automated algorithm for choosing the best systematic review
What methods work for evaluating the impact of public investments in RD&I
Smaldino and O’Connor (2020):

Why do bad methods persist in some academic disciplines, even when they have been clearly rejected in others? What factors allow good methodological advances to spread across disciplines? In this paper, we investigate some key features determining the success and failure of methodological spread between the sciences. We introduce a formal model that considers factors like methodological competence and reviewer bias towards one’s own methods. We show how self-preferential biases can protect poor methodology within scientific communities, and lack of reviewer competence can contribute to failures to adopt better methods. We then use a second model to further argue that input from outside disciplines, especially in the form of peer review and other credit assignment mechanisms, can help break down barriers to methodological improvement. This work therefore presents an underappreciated benefit of interdisciplinarity.
On the origin of psychological research practices, with special regard to self-reported nostril width:
when does a certain practice–e.g., a study design, a way to collect data, a particular statistical approach–”succeed” and start to dominate journals?

It must be capable of surviving a multi-stage selection procedure:
1. Implementation must be sufficiently affordable so that researchers can actually give it a shot
2. Once the authors have added it to a manuscript, it must be retained until submission
3. The resulting manuscript must enter the peer-review process and survive it (without the implementation of the practice getting dropped on the way)
4. The resulting publication needs to attract enough attention post-publication so that readers will feel inspired to implement it themselves, fuelling the eternally turning wheel of ~~Samsara~~ publication-oriented science
Which Kind of Science Reform | Elements of Evolutionary Anthropology

8 References

a Literal Banana. 2020. “Extended Sniff Test.”

Aczel, Szaszi, and Holcombe. 2021. “A Billion-Dollar Donation: Estimating the Cost of Researchers’ Time Spent on Peer Review.” Research Integrity and Peer Review.

Afonso. 2014. “How Academia Resembles a Drug Gang.” SSRN Scholarly Paper.

Agassi. 1974. “The Logic of Scientific Inquiry.” Synthese.

al-Gharbi. 2020. “Race and the Race for the White House: On Social Research in the Age of Trump.” Preprint.

Alon. 2009. “How to Choose a Good Scientific Problem.” Molecular Cell.

Alsan, and Wanamaker. 2018. “Tuskegee and the Health of Black Men*.” The Quarterly Journal of Economics.

Anderson, Ronning, DeVries, et al. 2010. “Extending the Mertonian Norms: Scientists’ Subscription to Norms of Research.” The Journal of Higher Education.

Arbesman. 2013. The Half-Life of Facts: Why Everything We Know Has an Expiration Date.

Arbesman, and Christakis. 2011. “Eurekometrics: Analyzing the Nature of Discovery.” PLoS Comput Biol.

Arbilly, and Laland. 2017. “The Magnitude of Innovation and Its Evolution in Social Animals.” Proceedings of the Royal Society B: Biological Sciences.

Arvan, Bright, and Heesen. 2022. “Jury Theorems for Peer Review.” The British Journal for the Philosophy of Science.

Azoulay, Fons-Rosen, and Zivin. 2015. “Does Science Advance One Funeral at a Time?” Working Paper 21788.

Baldwin. 2018. “Scientific Autonomy, Public Accountability, and the Rise of ‘Peer Review’ in the Cold War United States.” Isis.

Björk, and Solomon. 2013. “The Publishing Delay in Scholarly Peer-Reviewed Journals.” Journal of Informetrics.

Bogich, Balleseteros, Berjon, et al. n.d. “On the Marginal Cost of Scholarly Communication.”

Boyd. 2020. “Questioning the Legitimacy of Data.” Information Services and Use.

Caulfield, and Condit. 2012. “Science and the sources of hype.” Public health genomics.

Cole, Jr, and Simon. 1981. “Chance and Consensus in Peer Review.” Science.

Coscia, and Rossi. 2020. “Distortions of Political Bias in Crowdsourced Misinformation Flagging.” Journal of The Royal Society Interface.

Couzin-Frankel. 2015. “PubPeer Co-Founder Reveals Identity—and New Plans.” Science.

Dang, and Bright. 2021. “Scientific Conclusions Need Not Be Accurate, Justified, or Believed by Their Authors.” Synthese.

Devezer, Navarro, Vandekerckhove, et al. 2020. “The Case for Formal Methodology in Scientific Reform.” Royal Society Open Science.

Fernandes, Siderius, and Singal. 2025. “Peer Review Market Design: Effort-Based Matching and Admission Control.” SSRN Scholarly Paper.

Ferraz-Caetano. 2025. “Modeling Innovations: Levels of Complexity in the Discovery of Novel Scientific Methods.” Philosophies.

Flach, Spiegler, Golénia, et al. 2010. “Novel Tools to Streamline the Conference Review Process: Experiences from SIGKDD’09.” SIGKDD Explor. Newsl.

Gabry, Simpson, Vehtari, et al. 2019. “Visualization in Bayesian Workflow.” Journal of the Royal Statistical Society: Series A (Statistics in Society).

Gelman. 2011. “Experimental Reasoning in Social Science.” In Field Experiments and Their Critics.

———. 2018. “The Failure of Null Hypothesis Significance Testing When Studying Incremental Changes, and What to Do About It.” Personality and Social Psychology Bulletin.

Gelman, and Loken. 2012. “Statisticians: When We Teach, We Don’t Practice What We Preach.” Chance.

Gelman, and Shalizi. 2013. “Philosophy and the Practice of Bayesian Statistics.” British Journal of Mathematical and Statistical Psychology.

Gigerenzer. n.d. “We Need to Think More about How We Conduct Research.” Behavioral and Brain Sciences.

“Go Forth and Replicate!” 2016. Nature News.

Goldacre. 2012. Bad Pharma: How Medicine is Broken, And How We Can Fix It.

Greenberg. 2009. “How Citation Distortions Create Unfounded Authority: Analysis of a Citation Network.” BMJ.

Gupta, Chernesky, Lembke, et al. 2024. “The Opioid Industry’s Use of Scientific Evidence to Advance Claims about Prescription Opioid Safety and Effectiveness.” Health Affairs Scholar.

Gurobi Optimization, LLC. 2023. “Gurobi Optimizer Reference Manual.” Manual.

Hallsson, and Kappel. 2020. “Disagreement and the Division of Epistemic Labor.” Synthese.

Hanson, Barreiro, Crosetto, et al. 2024. “The Strain on Scientific Publishing.” Quantitative Science Studies.

Heesen, and Bright. 2021. “Is Peer Review a Good Idea?” The British Journal for the Philosophy of Science.

Hodges. 2019. “Statistical Methods Research Done as Science Rather Than Mathematics.” arXiv:1905.08381 [Stat].

Ioannidis. 2005. “Why Most Published Research Findings Are False.” PLoS Medicine.

Jackson, and SoniaLJackson. 2023. “The Cost of Manipulation: The Irresponsible Abuse of Technological Opacity in the Pharmaceutical Industry.” Corporate Social Responsibility and Business Ethics Blog.

Jan. 2018. “Recognition and Reward System for Peer-Reviewers.” In CEUR Workshop Proceedings.

Jiménez, and Mesoudi. 2019. “Prestige-Biased Social Learning: Current Evidence and Outstanding Questions.” Palgrave Communications.

Kirman. 1992. “Whom or What Does the Representative Individual Represent?” The Journal of Economic Perspectives.

———. 2010. “Learning in Agent Based Models.”

Krikorian, and Kapczynski. 2010. Access to knowledge in the age of intellectual property.

Krumrei-Mancuso, Pärnamets, Bland, et al. 2025. “Toward an Understanding of Collective Intellectual Humility.” Trends in Cognitive Sciences.

Lakatos. 1980. The Methodology of Scientific Research Programmes: Volume 1 : Philosophical Papers.

Laland. 2004. “Social Learning Strategies.” Animal Learning & Behavior.

Leyton-Brown, Mausam, Nandwani, et al. 2022. “Matching Papers and Reviewers at Large Conferences.”

Light, Lexchin, and Darrow. 2013. “Institutional corruption of pharmaceuticals and the myth of safe and effective drugs.” The Journal of Law, Medicine & Ethics: A Journal of the American Society of Law, Medicine & Ethics.

Lindsey. 1988. “Assessing Precision in the Manuscript Review Process: A Little Better Than a Dice Roll.” Scientometrics.

Littman. 2021. “Collusion Rings Threaten the Integrity of Computer Science Research.” Communications of the ACM.

Maxwell. 2009. “What’s wrong with science?” Edited by L. S. D. Santamaria. Sublime.

McCook. 2017. “Meet PubPeer 2.0: New Version of Post-Publication Peer Review Site Launches Today.” Retraction Watch (blog).

McShane, Gal, Gelman, et al. 2019. “Abandon Statistical Significance.” The American Statistician.

Medawar. 1969. Induction and Intuition in Scientific Thought.

———. 1982. Pluto’s Republic.

———. 1984. The Limits of Science.

Merali. 2010. “Computational Science: Error.” Nature.

Michelini, Osorio, Houkes, et al. 2023. “Scientific Disagreements and the Diagnosticity of Evidence: How Too Much Data May Lead to Polarization.” Journal of Artificial Societies and Social Simulation.

Musen, O’Connor, Schultes, et al. 2022. “Modeling Community Standards for Metadata as Templates Makes Data FAIR.” Scientific Data.

“Nature Editors: All Hat and No Cattle.” 2016.

Nguyen. 2020. “Cognitive Islands and Runaway Echo Chambers: Problems for Epistemic Dependence on Experts.” Synthese.

Nissen, Magidson, Gross, et al. 2016. “Publication Bias and the Canonization of False Facts.” arXiv:1609.00494 [Physics, Stat].

O’Connor. 2020. Games in the Philosophy of Biology.

———. 2024. Modelling Scientific Communities.

O’Connor, and Weatherall. 2017. “Scientific Polarization.” European Journal for Philosophy of Science.

———. 2021. “Modeling How False Beliefs Spread.” In The Routledge Handbook of Political Epistemology.

Oreskes, and Conway. 2022. Merchants of Doubt: How a Handful of Scientists Obscured the Truth on Issues from Tobacco Smoke to Climate Change.

“Post-Publication Criticism Is Crucial, but Should Be Constructive.” 2016. Nature News.

Potts, Hartley, Montgomery, et al. 2016. “A Journal Is a Club: A New Economic Model for Scholarly Publishing.” SSRN Scholarly Paper.

Ragone, Mirylenka, Casati, et al. 2013. “On Peer Review in Computer Science: Analysis of Its Effectiveness and Suggestions for Improvement.” Scientometrics.

Rekdal. 2014. “Academic Urban Legends.” Social Studies of Science.

Ridley, Kolm, Freckelton, et al. 2007. “An Unexpected Influence of Widely Used Significance Thresholds on the Distribution of Reported P-Values.” Journal of Evolutionary Biology.

Ritchie. 2020. Science Fictions: How Fraud, Bias, Negligence, and Hype Undermine the Search for Truth.

Rubin, and Schneider. 2021. “Priority and Privilege in Scientific Discovery.” Studies in History and Philosophy of Science Part A.

Rzhetsky, Foster, Foster, et al. 2015. “Choosing Experiments to Accelerate Collective Discovery.” Proceedings of the National Academy of Sciences.

Schimmer, Ralf, Geschuhn, Kai Karin, and Vogler, Andreas. 2015. “Disrupting the subscription journals’ business model for the necessary large-scale transformation to open access.”

Sekara, Deville, Ahnert, et al. 2018. “The Chaperone Effect in Scientific Publishing.” Proceedings of the National Academy of Sciences.

Sen. 1977. “Rational Fools: A Critique of the Behavioral Foundations of Economic Theory.” Philosophy and Public Affairs.

Serra-Garcia, and Gneezy. 2021. “Nonreplicable Publications Are Cited More Than Replicable Ones.” Science Advances.

Šešelja. 2022. “Agent‐based Models of Scientific Interaction.” Philosophy Compass.

Simmons, Nelson, and Simonsohn. 2011. “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant.” Psychological Science.

Smaldino, and O’Connor. 2020. “Interdisciplinarity Can Aid the Spread of Better Methods Between Scientific Communities.”

Smith. 2006. “Peer Review: A Flawed Process at the Heart of Science and Journals.” Journal of the Royal Society of Medicine.

Solomon. 2007. “The Role of Peer Review for Scholarly Journals in the Information Age.” Journal of Electronic Publishing.

Soto-Sanfiel, Chong, and Angulo-Brunet. 2025. “The Hype Effect: How Exaggerated AI Science News Shapes Perceptions of Scientists and Journalists.”

Spranzi. 2004. “Galileo and the Mountains of the Moon: Analogical Reasoning, Models and Metaphors in Scientific Discovery.” Journal of Cognition and Culture.

Stove. 1982. Popper and After: Four Modern Irrationalists.

Sumner, Vivian-Griffiths, Boivin, et al. 2014. “The Association Between Exaggeration in Health Related Science News and Academic Press Releases: Retrospective Observational Study.” The BMJ.

Suppes. 2002. Representation and Invariance of Scientific Structures.

Thagard. 1993. “Societies of Minds: Science as Distributed Computing.” Studies in History and Philosophy of Modern Physics.

———. 1994. “Mind, Society, and the Growth of Knowledge.” Philosophy of Science.

———. 1997. “Collaborative Knowledge.” Noûs.

———. 2005. “How to Be a Successful Scientist.” Scientific and Technological Thinking.

———. 2007. “Coherence, Truth, and the Development of Scientific Knowledge.” Philosophy of Science.

Thagard, and Litt. 2008. “Models of Scientific Explanation.” In The Cambridge Handbook of Computational Psychology.

Thagard, and Zhu. 2003. “Acupuncture, Incommensurability, and Conceptual Change.” Intentional Conceptual Change.

Thurner, and Hanel. 2010. “Peer-Review in a World with Rational Scientists: Toward Selection of the Average.”

van der Post, Franz, and Laland. 2016. “Skill Learning and the Evolution of Social Learning Mechanisms.” BMC Evolutionary Biology.

van Noorden. 2013. “Open Access: The True Cost of Science Publishing.” Nature.

Vazire. 2017. “Our Obsession with Eminence Warps Research.” Nature News.

Vijaykumar. 2020. “Potential Organized Fraud in ACM.”

Wible. 1998. Economics of Science.

Yarkoni. 2019. “The Generalizability Crisis.” Preprint.

Zollman, García, and Handfield. 2024. “Academic Journals, Incentives, and the Quality of Peer Review: A Model.” Philosophy of Science.