AI Agents for scientific knowledge discovery and generation
Outsourcing knowledge of base reality to bots
2023-01-22 — 2025-10-19
Wherein a survey of AI agents for scientific discovery is presented, and OpenScholar’s 45 million‑paper datastore with citation‑backed retrieval is noted as enabling evidence‑grounded synthesis.
A list of attempts and approaches to make scientific knowledge management and discovery — for science in particular — work via generative AI.
1 FutureHouse Platform
Fresh off the rack — it looks interesting. It synthesizes existing literature and identifies research gaps and areas of comparative advantage.
2 AI2
Ai2 OpenScholar: Scientific literature synthesis with retrieval-augmented language models | Ai2/ Source provides infrastructure for search via vector embeddings specialized for science papers.
Dissatisfied with the crappy search for ICLR, I built a quick semantic search using Ai2’s system. It was really easy and worked really well! You can find it here: danmackinlay/openreview_finder. It takes about 10 minutes to download and index the ICLR 2025 papers.
Ai2’s current stack for scientific workflows seems to centre on Asta and OpenScholar, two complementary efforts that lean on large scholarly corpora to keep answers grounded and citable. Asta acts as a broader platform (agents, benchmarks, developer resources), while OpenScholar focuses on retrieval-augmented, citation-backed literature synthesis with strong evaluation results.
2.1 Asta: USPs in practice
- Agentic research ecosystem: an AI research assistant, the AstaBench suite, and developer resources designed for transparency, reproducibility, and open development (blog post).
- Trust-first framing: launched as a “standard for trustworthy AI agents in science,” emphasizing verifiable, source-traceable outputs and an open framework.
- Practical capabilities: the assistant offers LLM-powered paper discovery, literature synthesis with clickable citations, and early-stage data analysis (in beta for partners).
- Benchmarks with receipts: AstaBench provides rigorous, holistic evaluation of scientific agents, aiming to clarify performance beyond anecdotes.
- Scholar QA has been folded into Asta (now “Summarize literature”), signalling a single, unified assistant surface.
2.2 OpenScholar: USPs in practice
- Purpose-built RAG for science: OpenScholar answers research questions by retrieving passages from a very large scientific datastore and synthesizing citation-backed responses, with an iterative self-feedback loop to improve quality.
- Scale of sources: The OpenScholar DataStore (OSDS) spans tens of millions of open-access papers and hundreds of millions of passage embeddings, enabling broad coverage across domains.
- Evaluation signal: As far as we can tell, OpenScholar reports higher correctness than larger closed-source models on ScholarQABench while keeping citation accuracy at expert-like levels and reducing fabricated references.
- Open ecosystem: The code, models, datastore, and a public demo are open-sourced, which makes it straightforward to inspect and extend the full pipeline.
2.3 How they fit together
Asta provides the broader scaffolding (agents, benchmarks, standards) for trustworthy scientific AI, while OpenScholar is a specialised literature-synthesis system that plugs into that ecosystem and supports evidence-grounded workflows. Both emphasize citing sources, enabling reproducible, inspectable reasoning steps, and providing useful open tech-stack tools for researchers and developers.
- Scholar QA → Asta — migration note: asta.allen.ai/synthesize
3 Elicit
Elicit: The AI Research Assistant uses large language models to solve this problem:
Elicit uses language models to help you automate research workflows, like parts of literature review.
Elicit can find relevant papers without perfect keyword match, summarise takeaways from the paper specific to your question, and extract key information from the papers.
While answering questions with research is the main focus of Elicit, there are also other research tasks that help with brainstorming, summarisation, and text classification.
4 Consensus
Search - Consensus: AI Search Engine for Research
Consensus is the AI-powered academic search engine
Search & analyze 200M+ peer-reviewed research papers
4.1 OpenAI Deep Research
Introducing Deep Research from OpenAI.
An agent that uses reasoning to synthesise large amounts of online information and complete multi-step research tasks for you.
4.2 Perplexity AI
It also offers a deep research feature.
5 Tool Universe
Once the setup is complete, the AI scientist operates as follows: given a user instruction or task, it formulates a plan or hypothesis, employs the tool finder in ToolUniverse to identify relevant tools, and iteratively applies these tools to gather information, conduct experiments, verify hypotheses, and request human feedback when necessary. For each required tool call, the AI scientist generates arguments that conform to the ToolUniverse protocol, after which ToolUniverse executes the tool and returns the results for further reasoning.
6 SciLire
SciLire: Science Literature Review AI Tool from CSIRO.
I just saw a tech demo from my colleagues. It looks promising for high-speed, AI-augmented literature review. On the other hand, CSIRO — like most Australian tech projects — isn’t well resourced, so my optimism is tempered.
6.1 ResearchRabbit
- Spotify for Papers: Just like in Spotify, you can add papers to collections. ResearchRabbit learns what you love and improves its recommendations!
- Personalised Digests: Keep up with the latest papers related to your collections! If we’re not confident something’s relevant, we don’t email you—no spam!
- Interactive Visualisations: Visualise networks of papers and co-authorships. Use graphs as new “jumping off points” to dive even deeper!
- Explore Together: Collaborate on collections, or help kickstart someone’s search process! And leave comments as well!
6.2 scite
scite: See how research has been cited
Citations are classified by a deep learning model that is trained to identify three categories of citation statements: those that provide contrasting or supporting evidence for the cited work, and others, which mention the cited study without providing evidence for its validity. Citations are classified by rhetorical function, not positive or negative sentiment.
- Citations are not classified as supporting or contrasting by positive or negative keywords.
- A Supporting citation can have a negative sentiment and a Contrasting citation can have a positive sentiment. Sentiment and rhetorical function are not correlated.
- Supporting and Contrasting citations do not necessarily indicate that the exact set of experiments was performed. For example, if a paper finds that drug X causes phenomenon Y in mice and a subsequent paper finds that drug X causes phenomenon Y in yeast but both come to this conclusion with different experiments—this would be classified as a supporting citation, even though identical experiments were not performed.
- Citations that simply use the same method, reagent, or software are not classified as supporting. To identify methods citations, you can filter by the section.
For full technical details including exactly how we do classification, what classifications and classification confidence mean, please read our recent publication describing how scite was built: (Nicholson et al. 2021)/