AI Agents for scientific knowledge discovery and generation

Outsourcing knowledge of base reality to bots

2023-01-22 — 2025-11-12

Wherein is presented the emergence of agentic systems for scientific inquiry; a retrieval‑backed datastore of 45 million papers is described as grounding literature synthesis with citation traceability.

academe
collective knowledge
faster pussycat
how do science
institutions
mind
networks
provenance
sociology
wonk
Figure 1

A list of attempts and approaches to make scientific knowledge management and discovery — for science in particular — work with generative AI.

1 FutureHouse / Edison

Fresh off the rack — it looks interesting. It synthesizes existing literature and identifies research gaps and areas of comparative advantage.

UPDATE: The FutureHouse platform was spun off into Edison Scientific. Their new model, Kosmos, sounds interesting: Kosmos: An AI Scientist for Autonomous Discovery:

Previous generations of AI Scientists, like FutureHouse’s Robin, have been limited primarily in their ability to synthesize large amounts of information. The finite context length of language models has meant that an AI Scientist can only take so many steps or make so many logical leaps before it runs out of road, limiting the complexity of the discoveries it can make. The core innovation in Kosmos is our use of structured world models, which allow us to efficiently incorporate information extracted over hundreds of agent trajectories and maintain coherence towards a specific research objective over tens of millions of tokens. An individual Kosmos run involves reading 1500 papers and running 42,000 lines of analysis code, far more than any other agent we are aware of.

2 AI2

Figure 2: Overview of the OpenScholar pipeline: OpenScholar leverages a large-scale datastore consisting of 45 million papers and uses a custom-trained retriever, reranker, and 8B parameter language model to answer questions based on up-to-date scientific literature (from Asai et al. 2024)

Ai2 OpenScholar: Scientific literature synthesis with retrieval-augmented language models | Ai2/ Source provides infrastructure for search via vector embeddings specialized for science papers.

Dissatisfied with the crappy search for ICLR, I built a quick semantic search using Ai2’s system. It was easy and worked really well! I put it here: danmackinlay/openreview_finder. It takes about 10 minutes to download and index the ICLR 2025 papers.

Ai2’s current stack for scientific workflows seems to centre on Asta and OpenScholar, two complementary (?) efforts that rely on large scholarly corpora to keep answers grounded and citable. Asta acts as a broader platform (agents, benchmarks, developer resources), while OpenScholar focuses on retrieval-augmented, citation-backed literature synthesis and has strong evaluation results.

AFAICT, Asta provides the broader scaffolding (agents, benchmarks, standards) for trustworthy scientific AI, while OpenScholar is a specialized literature-synthesis system that plugs into that ecosystem and supports evidence-grounded workflows. Both emphasize citing sources, enable reproducible, inspectable reasoning steps, and provide useful open tech-stack tools for researchers and developers.

2.1 Asta

  • Agentic research ecosystem: an AI research assistant, the AstaBench suite, and developer resources designed for transparency, reproducibility, and open development (blog post).
  • Trust-first framing: launched as a “standard for trustworthy AI agents in science,” emphasizing verifiable, source-traceable outputs and an open framework.
  • Practical capabilities: the assistant offers LLM-powered paper discovery, literature synthesis with clickable citations, and early-stage data analysis (in beta for partners).
  • Benchmarks with receipts: AstaBench provides rigorous, holistic evaluation of scientific agents, aiming to clarify performance beyond anecdotes.
  • Scholar QA has been folded into Asta (now “Summarize literature”), signalling a single, unified assistant surface.

2.2 OpenScholar

(Asai et al. 2024)

  • Purpose-built RAG for science: OpenScholar answers research questions by retrieving passages from a very large scientific datastore and synthesizing citation-backed responses, with an iterative self-feedback loop to improve quality.
  • Scale of sources: The OpenScholar DataStore (OSDS) spans tens of millions of open-access papers and hundreds of millions of passage embeddings, enabling broad coverage across domains.
  • Evaluation signal: OpenScholar reports higher correctness than larger closed-source models on ScholarQABench while keeping citation accuracy at expert-like levels and reducing fabricated references.
  • Open ecosystem: The code, models, datastore, and a public demo are open source, which makes it straightforward to inspect and extend the full pipeline.

3 Denario

Meet Denario — an AI assistant for every step of the scientific process

The new tool, developed by scientists at the Flatiron Institute, Cambridge University, the Autonomous University of Barcelona and others, leverages large language models to help scientists with tasks from developing new hypotheses to summarizing results. The team hopes Denario will make the research process faster, more dynamic and more interdisciplinary.

AstroPilot-AI/Denario: Modular Multi-Agent System for Scientific Research Assistance (Villaescusa-Navarro et al. 2025):

Denario is a multiagent system designed to be a scientific research assistant. Denario implements AI agents with AG2 and LangGraph, using cmbagent as the research analysis backend.

Cute update:

October 9, 2025 - A paper fully generated with Denario has been accepted for publication in the Open Conference of AI Agents for Science 2025, the 1st open conference with AI as primary authors.

4 Elicit

Elicit: The AI Research Assistant helps us solve this problem using large language models:

Elicit uses language models to help you automate research workflows, like parts of literature review.

Elicit can find relevant papers without perfect keyword match, summarise takeaways from the paper specific to your question, and extract key information from the papers.

While answering questions with research is the main focus of Elicit, there are also other research tasks that help with brainstorming, summarisation, and text classification.

5 Consensus

Search - Consensus: AI Search Engine for Research

Consensus is the AI-powered academic search engine

Search & analyze 200M+ peer-reviewed research papers

5.1 OpenAI Deep Research

Announcing Deep Research from OpenAI.

An agent that uses reasoning to synthesise large amounts of online information and complete multi-step research tasks for you.

5.2 Perplexity AI

Perplexity AI also offers a deep research feature.

6 Tool Universe

Once the setup is complete, the AI scientist operates as follows: given a user instruction or task, it formulates a plan or hypothesis, employs the tool finder in ToolUniverse to identify relevant tools, and iteratively applies these tools to gather information, conduct experiments, verify hypotheses, and request human feedback when necessary. For each required tool call, the AI scientist generates arguments that conform to the ToolUniverse protocol, after which ToolUniverse executes the tool and returns the results for further reasoning.

7 SciLire

SciLire: Science Literature Review AI Tool from CSIRO.

I just saw a tech demo from my colleagues. It looks promising for high-speed, AI-augmented literature reviews. On the other hand, CSIRO — like most Australian tech projects — isn’t well resourced, so I’m cautiously optimistic.

7.1 ResearchRabbit

ResearchRabbit:

  • Spotify for Papers: Just like in Spotify, you can add papers to collections. ResearchRabbit learns what you love and improves its recommendations!
  • Personalised Digests: Keep up with the latest papers related to your collections! If we’re not confident something’s relevant, we don’t email you—no spam!
  • Interactive Visualisations: Visualise networks of papers and co-authorships. Use graphs as new “jumping off points” to dive even deeper!
  • Explore Together: Collaborate on collections, or help kickstart someone’s search process! And leave comments as well!

7.2 scite

scite: See how research has been cited

Citations are classified by a deep learning model that is trained to identify three categories of citation statements: those that provide contrasting or supporting evidence for the cited work, and others, which mention the cited study without providing evidence for its validity. Citations are classified by rhetorical function, not positive or negative sentiment.

  • Citations are not classified as supporting or contrasting by positive or negative keywords.
  • A Supporting citation can have a negative sentiment and a Contrasting citation can have a positive sentiment. Sentiment and rhetorical function are not correlated.
  • Supporting and Contrasting citations do not necessarily indicate that the exact set of experiments was performed. For example, if a paper finds that drug X causes phenomenon Y in mice and a subsequent paper finds that drug X causes phenomenon Y in yeast but both come to this conclusion with different experiments—this would be classified as a supporting citation, even though identical experiments were not performed.
  • Citations that simply use the same method, reagent, or software are not classified as supporting. To identify methods citations, you can filter by the section.

For full technical details including exactly how we do classification, what classifications and classification confidence mean, please read our recent publication describing how scite was built: (Nicholson et al. 2021)/

8 Incoming

9 References

Asai, He, Shao, et al. 2024. OpenScholar: Synthesizing Scientific Literature with Retrieval-Augmented LMs.”
Beltagy, Lo, and Cohan. 2019. SciBERT: A Pretrained Language Model for Scientific Text.”
Borondo, Borondo, Rodriguez-Sickert, et al. 2014. To Each According to Its Degree: The Meritocracy and Topocracy of Embedded Markets.” Scientific Reports.
Channing, and Ghosh. 2025. AI for Scientific Discovery Is a Social Problem.”
Cohan, Feldman, Beltagy, et al. 2020. SPECTER: Document-Level Representation Learning Using Citation-Informed Transformers.” Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.
Coscia, and Vandeweerdt. 2022. Posts on Central Websites Need Less Originality to Be Noticed.” Scientific Reports.
Gao, Ju, Jiang, et al. 2025. A Semantic Search Engine for Mathlib4.”
Kang, Zhang, Jiang, et al. 2024. Taxonomy-Guided Semantic Indexing for Academic Paper Search.”
Nicholson, Mordaunt, Lopez, et al. 2021. Scite: A Smart Citation Index That Displays the Context of Citations and Classifies Their Intent Using Deep Learning.” Quantitative Science Studies.
Shen, Lin, Zhang, et al. 2023. RTVis: Research Trend Visualization Toolkit.”
Singh, D’Arcy, Cohan, et al. 2022. SciRepEval: A Multi-Format Benchmark for Scientific Document Representations.” In.
Villaescusa-Navarro, Bolliet, Villanueva-Domingo, et al. 2025. The Denario Project: Deep Knowledge AI Agents for Scientific Discovery.”
Wang, Fu, Du, et al. 2023. Scientific Discovery in the Age of Artificial Intelligence.” Nature.