AI System Automates Coding for Scientific Research

Empirical Research Assistance out-performs software written by experts

By Anne J. Manning | Press contact

May 19, 2026

Facebook Twitter Email LinkedIn

Key Takeaways

A new AI tool called Empirical Research Assistance (ERA) can automatically write high-performance scientific software.
ERA could significantly accelerate scientific discovery across many domains.

A research team at Google co-led by Michael Brenner, Catalyst Professor of Applied Mathematics and Physics at the Harvard John A. Paulson School of Engineering and Applied Sciences (SEAS) and Google research scientist, has produced a new artificial intelligence system that can automatically write scientific software programs that surpass the performance of human-written programs.

Published in Nature, the system is called Empirical Research Assistance (ERA), and the project was co-led by Brenner and Shibl Mourad from Google DeepMind. Harvard Ph.D. students Qian-Ze Zhu, Ryan Krueger, and Sarah Martinson contributed as Google student researchers while working in Brenner’s group. The research was done in Brenner's capacity as a Catalyst Professor, a position established by the University to enhance relationships between academia and the private sector by supporting senior faculty in research roles at external companies.

Across modern science, customized software is constantly used to test specific hypotheses or interpret complex data. The authors refer to this type of computer program as “empirical software” – a program whose sole purpose is to maximize how well it does on a scientific task, like making weather predictions or forecasting hospitalizations during a disease outbreak. Any problem that can be expressed as a numerical value – its “score” — is called a scorable task.

Empirical software for solving such scorable tasks underpins major advances across many fields, including three recent chemistry Nobel prizes. But the specialized, custom-built software to tackle these experiments is labor-intensive, requiring a human to test and sharpen code many times over.

The new ERA system removes this bottleneck by essentially automating the full cycle of scientific software design and refinement – a process that can normally take months or even years by human experts.

The system combined the Google Gemini large language model with a search strategy to explore and refine thousands of pieces of code – far faster and with greater breadth than a human could.

Starting with a baseline piece of code aimed at a specific problem, the new AI system proposes modifications by adding new components or switching out algorithms, toward the goal of improving a predefined quality score – for example, how accurately can this model predict the spread of a disease, based on past hospitalization numbers? How well does this model predict the shape of proteins based on these amino acid sequences?

The system uses a method called tree search — also used in game-playing systems like AlphaGo — to decide which promising ideas to pursue and which to discard in order to get better at the task of predicting hospitalization numbers, predicting protein shapes, etc.

schematic of AI assistant for empirical software — Schematic of the algorithm that feeds a scorable task and research ideas to an LLM, which generates evaluation code in a sandbox. This code is then used in a tree search, where new nodes are created and iteratively improved using the LLM. Credit: Google

The AI does not work in isolation. In the process, it can be guided by research ideas in papers or textbooks. These ideas can be provided directly by a user or retrieved automatically and incorporated into later versions of the code.

“This ability to integrate and recombine research ideas enables the system to find “needle-in-a-haystack” solutions that human research might never get to test,” Brenner said.

To prove it, the Harvard and Google team applied the ERA system to a diverse set of scientific problems. Zhu’s role in the project was to use ERA to predict the activity of more than 70,000 neurons in the brain of a zebrafish and compare it against actual neural data.

In one experiment, the team prompted ERA to use an existing neuron-modeling library to build more physically accurate simulations of neural activity. This task would have taken weeks or months for Zhu of learning a new software package, but ERA could assemble and tune the models automatically.

“This new system is going to accelerate scientific discovery by allowing you to explore a lot of ideas at the same time,” Zhu said. “Previously it might take you a week to implement some specific methods, but now you can just run them in parallel in a few hours.”

On one test, the ERA system generated 14 models for predicting COVID-19 hospitalizations that outperformed the best U.S. Centers for Disease Control models used during the pandemic.

In another experiment, ERA discovered four new methods for integrating single-cell RNA sequencing datasets, beating top human-designed approaches.

By reducing the time required for exploration of a set of ideas from months to hours or days, the new system could save significant time for scientists to “truly creative and critical challenges, and to continue to define and prioritize the fundamental research questions and societal challenges that scientific research can help address,” according to a Google blog post about the breakthrough.

Topics: AI / Machine Learning, Applied Computation, Applied Mathematics, Data Sciences, Computer Science, Industry, Research