Ntibio Ai Final
Ntibio Ai Final
The authors gratefully acknowledge the support of those who were instrumental in the
development of this report, including the many expert interviewees who generously shared their
time and expertise. We would like to thank James Diggans at Twist Bioscience, Justin Farlow at
Serotiny, Ryan Ritterson at Gryphon Scientific, and colleagues at Google DeepMind for providing
particularly thoughtful comments in reviewing this report. Discussions with NTI Co-Chair and
CEO Ernest Moniz and NTI President and COO Joan Rohlfing provided valuable insights in shaping
the report recommendations. We would also like to thank Rachel Staley Grant for managing the
production of this report and Hayley Severance for her assistance with managing this project.
We are also deeply grateful to Fidelity Charitable for supporting this work.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
The views expressed in this publication do not necessarily reflect those of the NTI Board of Directors or the institutions
with which they are associated.
Biosecurity Implications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Recommendations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Biodesign Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Automated Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Biosecurity Implications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
AI Biodesign Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Automated Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Appendix A: Participants. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Endnotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
The Convergence of Artificial Intelligence and the Life Sciences
Executive Summary
Rapid scientific and technological advances are fueling a 21st-century
biotechnology revolution. Accelerating developments in the life sciences and
in technologies such as artificial intelligence (AI), automation, and robotics are
enhancing scientists’ abilities to engineer living systems for a broad range of
purposes. These groundbreaking advances are critical to building a more productive,
sustainable, and healthy future for humans, animals, and the environment.
These tools could expand access to knowledge z What are the biosecurity implications of these
and capabilities for producing well-known toxins, developments?
pathogens, or other biological agents. Soon, some
z What are the most promising options for
AI-bio capabilities also could be exploited by
governing this important technology that will
malicious actors to develop agents that are new or
effectively guard against misuse while enabling
more harmful than those that may evolve naturally.
beneficial applications?
Given the rapid development and proliferation
of these capabilities, leaders in government,
To answer these questions, this report
bioscience research, industry, and the biosecurity
presents key findings informed by interviews
community must work quickly to anticipate
with more than 30 individuals with expertise in AI,
emerging risks on the horizon and proactively
biosecurity, bioscience research, biotechnology,
address them by developing strategies to protect
and governance of emerging technologies.
against misuse.
Building on these findings, the report includes
To address the pressing need to govern AI- recommendations from the authors on the path
bio capabilities, this report explores three key toward developing more robust governance
questions: approaches for AI-bio capabilities to reduce
biological risks without unduly hindering
z What are current and anticipated AI capabilities scientific advances.
for engineering living systems?
www.nti.org 3
The Convergence of Artificial Intelligence and the Life Sciences
4 www.nti.org
The Convergence of Artificial Intelligence and the Life Sciences
and carrying out experiments using laboratory AI biodesign tools, in contrast, may be able to
robotics, analyzing results, and forming updated generate toxin or pathogen designs that are not
hypotheses. These capabilities have the potential found in nature. Some of these could be more
to speed up the scientific process in a number harmful than versions that may evolve naturally.
of ways, including by scaling up and outsourcing Using these types of tools currently requires some
work, reducing the number of experiments expertise, but they will likely become easier to use
that need to be performed, and removing time in the near future. Significant uncertainty remains
constraints and errors inherent in human labor. about if or when AI biodesign tools might be able
Although some chemistry research has been to generate reliable designs for biological agents
completely automated in this way, automation of that are as complex as pathogens, and there are
work involving living systems has proven more major barriers to converting a digital design into
challenging, and only parts of the process can biological reality, including generating, testing,
currently be automated. It is unclear if or when and deploying these agents.
more AI advances will make it possible to fully
Notwithstanding the risks, AI-bio capabilities will
automate systems to support life science research.
also benefit society and bolster biosecurity and
pandemic preparedness. In addition to broadly
www.nti.org 5
The Convergence of Artificial Intelligence and the Life Sciences
requires the development of novel solutions and at the interface where digital biological designs
represents a significant and urgent challenge. become physical reality. For example, many
providers of synthetic DNA conduct biosecurity
Many developers of natural language LLMs
screening to ensure that pathogen or toxin DNA
already are implementing methods to safeguard
is not sold to customers who lack a legitimate
their models against misuse. Current technical
use for it. These practices are currently largely
safeguards include training AI models to refuse
voluntary, but governments could put in place
to engage on particular topics and employing
more effective incentives or legal requirements.
other methods to prevent them from outputting
Improved screening tools would allow these
potentially harmful information. To assess the
providers to keep pace with the increasing number
robustness of these methods, it is essential to
of novel designs generated by AI biodesign tools
evaluate models, for example with “red-teaming”
by screening DNA sequences on the basis of their
exercises to determine their potential for misuse.
potential encoded functions rather than just their
The success of these technical safeguards also
similarity to known sequences. Other types of
requires that AI model developers control access to
life science vendors and organizations also could
their models. This can be challenging, particularly
bolster biosecurity by screening for customer
because some smaller AI models, including many
legitimacy. These vendors and organizations
AI biodesign tools, are developed as open-source
include contract research organizations, academic
resources. Other potential guardrails for AI models
core facilities, and providers of cloud laboratory
include controlling access to the computational
services, robotics, and other life sciences products
infrastructure needed to train powerful models
and services.
or to potentially harmful data, but there are
open questions about the effectiveness of these While more effective guardrails can offer
approaches that will be important to resolve. To significant risk reduction benefits, it is unlikely
further develop guardrails, AI model developers that they will eliminate all biosecurity risks
will need to work collaboratively with biosecurity that may arise at the intersection of AI and the
experts to understand the biosecurity risks posed life sciences. Therefore, resilient public health
by their models, develop best practices, and refine systems and strong pandemic preparedness and
and update approaches. response capabilities will remain key safeguards;
these capabilities can be substantially improved
In addition to developing AI model guardrails, there
through AI-enabled advances.
are opportunities to improve biosecurity oversight
6 www.nti.org
The Convergence of Artificial Intelligence and the Life Sciences
Recommendations
Establish an international “AI-Bio Forum” Implement promising AI model guardrails
to develop AI model guardrails that reduce at scale
biological risks
AI model developers should implement the most
The Forum should be composed of key promising already developed guardrails that
stakeholders and experts, including AI model reduce biological risks without unduly limiting
developers in industry and academia and beneficial uses. They should collaborate with other
biosecurity experts within government and civil entities, including the AI-Bio Forum described
society. It should serve as a venue for developing above, to establish best practices and develop
and sharing best practices for implementing resources to support broader implementation.
effective AI-bio guardrails, identifying emerging Governments, biosecurity organizations, and
biological risks associated with ongoing AI others should explore opportunities to scale up
advances, and developing shared resources to these solutions nationally and internationally,
manage these risks. It should inform efforts by through funding, regulations, and other incentives
AI model developers in industry and academia, for adoption. Existing guardrails that should be
governments, and the broader biosecurity broadly implemented include AI model evaluations,
community, and it should establish global methods for users to proactively report hazards,
norms for biosecurity best practices in these technical safeguards to limit harmful outputs, and
communities. access controls for AI models.
www.nti.org 7
The Convergence of Artificial Intelligence and the Life Sciences
Strengthen biosecurity controls at the The convergence of AI and the life sciences marks
interface between digital design tools and a new era for biosecurity and offers tremendous
physical biological systems potential benefits, including for pandemic
preparedness and response. Yet, these rapidly
z Tool developers in industry, academia, and non- developing capabilities also shift the biological
governmental organizations should develop new risk landscape in ways that are difficult to predict
AI tools to strengthen DNA sequence screening and have the potential to cause a global biological
approaches to capture novel threats and catastrophe. The recommendations in this report
improve the robustness of current approaches. provide a proposed path forward for taking
action to address biological risks associated with
z Governments, international bodies, and other
rapid advances in AI-bio capabilities. Effectively
key players should work to strengthen DNA
implementing them will require creativity, agility,
synthesis screening frameworks, including by
and sustained cycles of experimentation, learning,
legally requiring screening practices.
and refinement.
z Governments and others should expand
The world faces significant uncertainty about
available tools, requirements, and incentives for
the future of AI and the life sciences, but it is
customer screening to a wide range of providers
clear that addressing these risks requires urgent
of life science products, infrastructure, and
action, unprecedented collaboration, a layered
services.
defense, and international engagement. Taking
a proactive approach will help policymakers and
Use AI tools to build next-generation others anticipate future technological advances
pandemic preparedness and response on the horizon, address risks before they fully
capabilities materialize, and ultimately foster a safer and more
secure future.
Governments, development banks, and other
funders should dramatically increase investment
in pandemic preparedness and response, including
by supporting development of next-generation AI
tools for early detection and rapid response.
8 www.nti.org
The Convergence of Artificial Intelligence and the Life Sciences
Introduction
Modern bioscience and biotechnology are critical to building a more
productive, sustainable, and healthy future for people, animals, and the
environment. Rapid advances in these fields will have transformative
effects on manufacturing, agriculture, energy production, and medicine.
Recent progress in artificial intelligence (AI) technologies is steadily
converging with the life sciences, building on decades of research
and data collection, and will further accelerate these developments.
The convergence of AI with biology will undoubtedly offer significant
benefits, but it also poses new and poorly understood risks. This report
describes this intersection, including AI tools and capabilities that enable
engineering of living systems, the biosecurity implications of these
developments, and opportunities to reduce risks.
www.nti.org 9
The Convergence of Artificial Intelligence and the Life Sciences
While AI-bio capabilities can offer important designed to improve our understanding of
benefits, biosecurity experts warn that they biological systems, but they have important
could also cause harm through accidental or intersections with the biosciences. AI biodesign
intentional misuse. Malicious actors could exploit tools are trained on biological data, such as DNA
these tools to develop novel or more harmful and protein sequences, and are often used by
toxins, increasingly dangerous pathogens, or specialists working to design biological systems.
other engineered biological agents. Given the Scientists use these tools for a wide range of
rapid development and proliferation of AI-bio practical purposes, such as for designing vaccines
capabilities, it is critical to quickly identify and understanding the mechanisms of disease
potential risks and begin to implement strategies transmission.1 Automated science is incorporating
to protect against their misuse. AI into many steps in the scientific process, from
the generation of hypotheses to the improvement
AI is intersecting with biology in a wide variety
of robotic experimentation, to data analytics.2
of contexts, with tools developed for a broad
These growing capabilities have the potential
range of purposes (see box 1). LLMs developed
to enable the testing of more hypotheses and
by OpenAI, Meta, and Google, are not specifically
accelerate the pace of scientific discovery.
This report uses multiple terms to describe AI tools that intersect with the life sciences.
AI-bio capabilities refers to the full range AI tools, models, and technologies that contribute to
advances in the life sciences and bioengineering.
In this report, LLM refers to large language models trained on natural language (i.e., human
language) as well as the associated applications built on top of them, such as chatbots that
respond to text-based queries. LLMs can also be used to model other types of data, such as
images, audio, and biological sequences, but unless otherwise specified, this report focuses on
natural language LLMs. Other reports or analyses may refer to these models as “foundation”
models if they are trained on large amounts of data and can be repurposed for more specific
tasks, or as “frontier” models if they are close to the leading edge of AI capabilities.
Biodesign tool refers to any AI model that is used to design biological parts, systems, or
organisms according to desired characteristics defined by the user. Some AI biodesign tools
are LLMs that are trained on biological sequences rather than on natural languages, and this
report considers them as biodesign tools. Some analysis of biodesign tools also draws on AI
models, such as AlphaFold2, that are trained on biological data and provide insight into biology
but do not provide biological designs.
Automated science refers to a range of AI tools and capabilities that can automate one or more
steps in the scientific discovery process.
10 www.nti.org
The Convergence of Artificial Intelligence and the Life Sciences
Each of these types of tools change the risk Context and Methodology
landscape in unique ways. For example, LLMs
may be helpful to users with less scientific The Nuclear Threat Initiative (NTI) is a nonprofit,
expertise who seek to learn more about pathogens, nonpartisan global security organization focused
pathogen engineering, or laboratory techniques. on reducing nuclear and biological threats
Effectively using AI biodesign tools requires imperiling humanity. Within NTI, the Global
more expertise but could generate a wide variety Biological Policy and Programs team (NTI | bio)
of designs for toxins or, further into the future, works with governments, industry, academia,
pathogens or other biological agents with desired international organizations, and civil society to
characteristics. AI automation may enable larger- prevent catastrophic biological events, including
scale testing of biological designs, allowing through its work to strengthen biotechnology
better optimization of desired characteristics. It governance. NTI | bio is advancing this work
is likely that different types of AI-bio capabilities through the Biosecurity Innovation and Risk
will increasingly be combined in the future. Reduction Initiative,4 which focuses on addressing
For example, future AI tools could use LLMs to emerging biological risks associated with rapid
interpret a user’s text-based prompts, and use a technological advances. Under this initiative,
biodesign tool to generate a design that satisfies NTI | bio has worked to bolster safeguards for
the user’s request,3 and AI-enabled automated DNA synthesis technologies5 and to strengthen
science systems could help experimentally biosecurity governance worldwide, specifically
evaluate AI-generated biological designs. through the establishment of the International
Biosecurity and Biosafety Initiative for Science.6
Experts remain uncertain about how LLMs,
AI biodesign tools, and AI-bio capabilities for This report stems from the recognition that
automating science will change in the near future, developing effective guardrails will be a critical
when developments or breakthroughs will occur, element of broader efforts to safeguard the tools
and how new biosecurity risks will materialize. of modern bioscience and biotechnology against
This report aims to provide as much clarity as accidental or deliberate misuse. It draws on
possible about anticipated risks and opportunities structured interviews with more than 30 experts
posed by AI and to provide recommendations in AI, biosecurity, bioscience research, synthetic
on the path forward. It is imperative that AI biology and biotechnology, and governance
model developers, policymakers, and biosecurity of emerging technologies. The authors also
experts acknowledge and plan for unanticipated convened a virtual workshop in August 2023 with
capabilities and risks that will emerge as AI interviewees and additional experts to discuss
continues to intersect with biology in new ways. preliminary findings and recommendations that
emerged from the interview process (for the list
of participants, see appendix A). The first three
sections of this report draw heavily on the expert
opinions and perspectives that were gathered
over the course of this project, though no attempt
was made to generate consensus among this
group. The final section of the report includes
recommendations that build on the key findings
but were developed by the authors alone and do
not necessarily reflect the views of these experts.
www.nti.org 11
The Convergence of Artificial Intelligence and the Life Sciences
What important unsolved problems in the life sciences and engineering living
systems might AI and/or machine learning tools solve in the next two to five
years?
What existing problems might AI or machine learning tools solve faster or with
less expertise required from the user? Will AI or machine learning tools lower
the barriers to entry for engineering living systems? If so, how?
12 www.nti.org
The Convergence of Artificial Intelligence and the Life Sciences
The application of AI to bioscience and generating text responses to user prompts.8 For
biotechnology is not a recent development. the life sciences, these models will enable people
Initial AI tools were limited by the amount of to conduct basic research, engineer biology, or
data available to train them;7 however, a recent simply satisfy curiosity by bringing together
explosion of data has catalyzed rapid progress, and synthesizing large amounts of information.
driving major advances. A wide range of data— Models specific to biological applications have also
including text-based data, protein structures, DNA advanced rapidly,9 and many other breakthroughs
sequences, and other experimental results—has are on the horizon. These AI biodesign tools
contributed to our understanding of biology and will enable scientists to design new proteins
has provided fertile ground for the emergence of and other features of biological systems much
new AI-bio capabilities. more rapidly for a wide range of applications in
medicine, fuels, foods, materials, and other fields.
Recently, AI-bio capabilities have transitioned
A variety of AI tools and capabilities are coming
from the prediction of outcomes to the active
together to improve the automation of science—
generation of content, which marks a significant
from literature searches and AI-driven robotic
inflection point and changes the broader
experimentation to interpretation of results—
landscape of AI’s impact on bioscience and
increasing the pace of scientific advancement.
biotechnology. In this report, we discuss three
In addition to significant benefits for the life
types of AI-bio capabilities: LLMs, biodesign tools,
sciences more broadly, all three of these types
and automated science (see box 1).
of AI-bio capabilities will contribute in important
Conversational LLM applications, such as ways to public health and pandemic preparedness
ChatGPT, have captured the public imagination by and response.
Term Definition
Application A set of protocols and tools that allow different software applications to
programming interact with each other and share data in a standardized way, enabling
interface (API) developers to create new applications without having to start from
scratch.
www.nti.org 13
The Convergence of Artificial Intelligence and the Life Sciences
Term Definition
Digital- The point at which digital biological designs begin to be constructed into
physical physical biology. Biological designs will first be constructed in software,
interface after which physical molecules will need to be assembled to make the
design into a biological system. The clearest example of the digital-
physical interface is DNA synthesis.
Frontier model A foundation model that is close to, or exceeds, the capabilities
currently present in the most advanced models but differs with respect
to its scale, design, or capabilities.
Generative A model that can generate new content rather than produce predictions
model or evaluations of existing content.
Open source A model of software development in which the source code is made
freely available to the public, allowing anyone to view, use, modify, and
distribute it. The open-source movement emphasizes collaboration and
community-driven development, with the goal of creating high-quality
software that benefits everyone.
Dynamic vs. A static model is trained once on a defined set of data. A dynamic model
static models continually updates by training on new information.
14 www.nti.org
The Convergence of Artificial Intelligence and the Life Sciences
www.nti.org 15
The Convergence of Artificial Intelligence and the Life Sciences
a standard online search engine or watching a to achieve their goals more quickly and with fewer
video of the work being carried out. In addition, the resources and experiments.
engineering biology capabilities that LLMs provide
Advances in LLMs have contributed to advances
are likely to be limited to information and tasks
in AI biodesign tools. Early in the history of
that are well specified and publicly described,
computational biology, researchers recognized
and users would still require some fundamental
that DNA and protein sequences resembled
understanding of science and practical laboratory
human language in their sequential nature and
skills to verify that their experiments have yielded
in the overarching “grammar” that determines
the desired results.
their structure and function. Some advances in
Another limitation of LLMs is that they may AI biodesign tools are therefore supported by
“hallucinate,” producing incorrect information advances in LLMs, but achieving more significant
that they convincingly present as true. Novices advances in biodesign tools faces additional
may find it challenging to identify these false challenges owing to limitations in the volume of
statements and could easily be misled. This data available to train the tools.
drawback is widely acknowledged, and some,
though not all, AI experts are optimistic that LLM
AI-Enabled Protein Design Tools
developers will significantly reduce this problem
over the next five years. A further limitation of Among the available AI-enabled tools for biological
LLMs is their rudimentary reasoning abilities, design, the capabilities of protein design tools have
which are prone to fail often, especially when advanced most rapidly over the last few years. In
performing tasks that require several sequential 2020, AlphaFold 2, an AI-enabled protein structure
steps or logical leaps.16 These capabilities are prediction tool developed by Google DeepMind,
likely to improve over time, and developers are garnered significant attention from scientists and
working specifically to enhance the reasoning the general public when it accurately predicted
abilities of these models, for example, with chain- the three-dimensional structure of approximately
of-thought prompting.17 90 percent of the protein sequences it was tested
on, a vast improvement over previous methods.18
AlphaFold 2 is just one tool among many developed
Biodesign Tools in recent years for protein structure prediction and
protein design (for a list of biodesign tools, see
AI biodesign tools are mostly used by specialists appendix B). Many existing AI tools that are trained
working to design biological systems, and as on biological data, like AlphaFold2, do not generate
such are trained on biological data, such as DNA biological designs, but have provided valuable data
or protein sequences. They typically require for training and refining biodesign tools.
more skill and expertise to operate than general-
purpose LLMs. Compared with methods that do not Scientists can use protein design tools for a range
use AI, these tools can increase the likelihood of of beneficial applications, including antibody
producing successful designs, allowing scientists and vaccine design and novel therapeutics, as
Among the available AI-enabled tools for biological design, the capabilities
of protein design tools have advanced most rapidly over the last few years.
16 www.nti.org
The Convergence of Artificial Intelligence and the Life Sciences
well as foods, materials, and improved enzymes can generate and refine promising candidates.
for biomanufacturing and other applications. (For more details, see box 3.) These AI protein
Scientists currently use AI protein design tools design tools are typically open source and may
such as RFDiffusion19 and ProteinMPNN20 to be available through platforms such as Google
generate new protein sequences with desired Collaboratory or Hugging Face so individuals
characteristics related to structure, ability to bind can use them without installing any software or
to another molecule, and stability. The landscape acquiring their own computing infrastructure.
of possible protein sequences is vast, and AI
AI has tremendously affected the field of protein design in the past few years.21 The
protein design process often involves combining multiple design tools to optimize multiple
characteristics, such as protein structure, binding characteristics, and solubility. Candidate
designs are tested in the laboratory to confirm whether they have the predicted properties
and often require further optimization through experimentation. Before the introduction of AI,
directed evolution was the primary approach used to design proteins.22 This approach begins
with choosing a natural sequence close to the desired design, subsequently mutating it to
generate many variants, then selecting those with better properties, and repeating this process
until finding a satisfactory result. This approach tests only a small subset of possible variants
of the original sequence and thus likely leads to a suboptimal solution.
Generative AI tools can improve protein design in two ways. First, they can generate entirely
new sequences that have desired properties, potentially providing a more promising starting
point than directed evolution. Second, AI can help select the best variants for experimental
testing to understand how sequence affects the properties of interest and thus improve them.
Experts familiar with current AI protein design tools reported success rates of 20 to 50 percent
for their most successful design tasks. The applications with the highest success rates are
likely to be those that require high precision and specificity, but are not intended to affect
complex biological systems, such as cells, more broadly.
Although AI prediction of protein structure is relatively mature, there are classes of proteins
for which little data exist as a consequence of their fundamental characteristics—for example,
being disordered or hard to crystallize, or existing in complexes—and for which existing
methods fail. Two types of proteins that pose particular challenges are peptides (i.e., short
protein sequences, often comprising about 20 amino acids or fewer), proteins that incorporate
non-natural amino acids. Many models struggle to design longer sequences, including
sequence lengths that are common for natural proteins. For example, 200 amino acids is the
limit for xTrimo-PGLM,23 a leading AI protein design tool, whereas many naturally occurring
proteins contain more than 300. Proteins that form complexes with DNA, RNA, or small
molecules have also proven challenging to design.
www.nti.org 17
The Convergence of Artificial Intelligence and the Life Sciences
Some model developers speculate that if we used the same amount of computational
resources as LLMs when training protein language models, we could significantly improve
their performance. However, there is significantly more natural language data available than
carefully annotated biological sequence data. Furthermore, while there are many biological
sequences online, many of them do not actually provide much new information. This is because
many of the biological sequences that do exist are highly related; for example, many are
variants of the same protein and may include non-functional variants. Larger protein design
models often use a subset of this data and remove highly similar sequences to improve the
model’s performance.
How to determine a protein’s function from its sequence remains a fundamental question
in the life sciences. Researchers have found that as AI models are trained on more data,
structure prediction improves at roughly twice the rate of function prediction, reflecting the
greater difficulty of predicting biological function.24 AI models are capable of predicting which
mutations will disrupt the function of a protein or non-coding sequence but cannot predict if a
mutation will result in a new function. Existing functional prediction benchmarks are typically
limited to a small number of cases for which many data points are available and can predict
only a narrow range of functions. Therefore, although AI models are likely to improve our
understanding of the links between protein structure and function, much more work is needed
to make these predictions broadly reliable.
Interest in engineering biology has grown over the That being said, AI biodesign tools for applications
past 20 years, with the vision of using biological beyond protein design face significant challenges,
systems to provide a wide range of products and most are not yet mature. Tools such as
and to address difficult challenges such as ExpressionGAN can design sequences of DNA
achieving carbon sequestration and preventing to better control the timing, the conditions
environmental degradation.25 As the field has required for protein production, and the amount
progressed, bioengineering researchers and of protein production in a cell.27 Other DNA design
practitioners have worked hard to make it more tools can generate DNA sequences that take on
of an engineering discipline than one requiring a specific three-dimensional shape—known as
bespoke designs. Standardized languages, such DNA origami—or bind tightly to targets to act as
as the Synthetic Biology Open Language,26 and biosensors or antibodies.28 Researchers have
standardized biological components have enabled also developed LLMs that use DNA sequence data
the application of design thinking to biological instead of natural language as foundation models
systems. Many experts pointed to significant that can be fine-tuned for specific tasks, for
investment in these areas, which they believe will example, predicting sequences of DNA that will
regulate gene expression or protein production.
18 www.nti.org
The Convergence of Artificial Intelligence and the Life Sciences
There is also significant interest in AI biodesign industrial processes and high-throughput assays
tools to design metabolic pathways—genetic for measuring protein characteristics, will likely
circuits in bacteria or yeast that can produce a produce this type of large data set. Government-
range of small molecules—which are important supported efforts, such as national strategies for
for biomanufacturing. For example, AI tools such genomic pathogen surveillance, will also boost
as novoStoic and RetroPath229 can help choose data availability. However, experts repeatedly
efficient pathways for producing small molecules, pointed out that biological systems are complex
optimize the genetic components of a pathway, and that our ability to measure and generate
and design cells that will grow in large bioreactors reliable data about biological functions is limited.
(vessels used to manufacture biomolecules at Progress may therefore be slow.
scale).30 Companies are likely to make substantial
Experts in AI biodesign tools reported that only a
investments in generating data to improve these
fraction of designs generated by a given tool are
types of tools because significant economic
successful, and that a large number of designs
drivers exist for these advances.31 However,
need to be created and experimentally tested to
current work in this area predominantly focuses
select the best candidates for further work. In
on specific strains of bacteria and yeast and
addition to requiring the laboratory infrastructure
does not transfer well to new species, limiting
and know-how to conduct these experiments,
the applicability of these advances to pathogens,
this limits designs to characteristics that can be
human cells, or other living systems.
evaluated efficiently in a laboratory. Furthermore,
Some experts believe that AI biodesign tools mistakes made by biodesign tools compound,
will expand the frontiers of what is biologically so the more biological parts and desired
possible, allowing the design of sequences and characteristics the design tool needs to consider,
functions that are unlike those found in nature. the lower the likelihood of a successful design.
Experts are divided on when this will be achieved; Researchers are exploring ways to improve these
many believe that progress will concentrate in models by linking experimental outputs directly
areas that receive large amounts of funding and back into the models to enable iterative learning.
in which it is possible to quickly generate large
The utility of AI biodesign tools is currently limited
amounts of data. Still, some believe that these
by users’ ability to express what they want in
capabilities will emerge within the next five years
a language that the models can interpret. This
as a result of the acceleration of design-build-
requires expertise. For example, a biodesign
test cycles, driven by greater testing throughput,
tool that designs proteins for improved binding
improved design accuracy, and automated
to a target molecule may require a user to input
measurement of results, which will help generate
parameters that are based on the user’s detailed
data for training AI biodesign tools.
technical knowledge about the location of atoms
at specific places in three-dimensional protein
Limitations of Biodesign Tools structures. Some experts believe that in the future,
these tools will enable users to design proteins
Experts pointed to several limitations in the
that bind a wide range of targets without having
capabilities and use of AI biodesign tools. A major
detailed knowledge, such as understanding the
limiting factor, as noted, is the availability of
details of their molecular structures. Experts
training data. Models generally perform well where
point out that the integration of chatbots with
large, labeled data sets exist (e.g., for protein
these cutting-edge tools could facilitate this
structure) and poorly outside of these specific
communication in natural language, thereby
areas. Technical areas with strong economic
making biodesign tools more accessible to those
drivers, such as metabolic engineering for
with considerably less expertise.
www.nti.org 19
The Convergence of Artificial Intelligence and the Life Sciences
Automated Science
The term “automated science” refers to the use characterized genes in yeast and only required
of AI to automate steps in scientific discovery human assistance with replenishing experimental
or the transfer of the entire process to AI. One reagents and removing waste.33 Eve, developed by
of the challenges in scientific discovery is the the same group of scientists in 2015, automated
vast number of possible experiments that could early-stage drug discovery to identify new drugs
be conducted, making systematic exploration of for treating neglected tropical diseases.34 In 2020,
all options by humans impracticable. AI has the a team in Liverpool developed a free-roaming
potential to revolutionize scientific discovery laboratory robot that could autonomously search
by automating this exploration and intelligently for catalysts to initiate a desired chemical
choosing the scientific questions that are likely reaction.35
to be the most informative and useful to explore.
A more recent approach to automated science is
These models can simulate larger systems than
the development of autonomous AI agents that can
humans can—for instance, the interactions of
interact with multiple AI tools to coordinate the
millions of particles.32 However, AI struggles to
completion of a complex task. Examples include
capture the rules that govern complex interacting
AutoGPT, which chains together “thoughts”
systems with existing data. The ability to make
generated by an LLM to autonomously achieve
targeted changes to biological systems and
a goal, aided by its ability to search the Internet
measure their effects will improve the ability of AI
and interact with available applications, ranging
models to interpret these causal relationships.
from simple calculators to advanced AI biodesign
AI tools have been used for all steps in the tools. An example in chemistry is ChemCrow,
scientific process: researching literature, which enables the design of chemical synthesis
generating hypotheses, designing experiments, processes using natural language requests,
writing software, programming instructions for such as “synthesize ibuprofen.”36 Recently,
robotics platforms, collecting data, and analyzing researchers used ChatGPT to write a scientific
and interpreting results (for more details, see paper from scratch.37 Provided with a data set,
box 4). Some experts believe that more steps will ChatGPT formulated a question, wrote code to
become automated in the near future and that perform the analysis, described its methods, and
it may become difficult to avoid interacting with interpreted the findings. Its initial attempts at
AI when carrying out some types of scientific coding contained mistakes, and parts of the paper
research. contained fabricated information, but additional
human prompting corrected these errors.
As an example of automated science advances, in
2009, scientists developed a robot scientist, called Some experts expressed concerns about
Adam, that discovered the functions of poorly automated science because most users of AI
AI tools have been used for all steps in the scientific process: researching
literature, generating hypotheses, designing experiments, writing
software, programming instructions for robotics platforms, collecting
data, and analyzing and interpreting results.
20 www.nti.org
The Convergence of Artificial Intelligence and the Life Sciences
tools do not have a strong understanding of how leading to surprises when AI fails in ways that a
they work, which could lead to an overestimation human would not. AI models often work as “black
of their abilities and blind trust in their outputs. boxes,” making it difficult to understand and
Users could also assume that algorithms process validate the scientific insights that they generate.
information in the same way that humans do,
AI is already contributing to many steps in scientific research, and it is likely that AI tools for
automated science will become more integrated in the future to provide a more comprehensive
AI-enabled scientific discovery process.
Literature research
AI has substantially improved tools that aid background research. Tools such as scite and Elicit
use LLMs to query, interpret, and summarize scientific literature, as well as to allow claims to
be checked against original source material to ensure the accuracy of the information they
provide. ResearchRabbit builds networks of related research papers based on citations.
Hypothesis generation
Natural language processing of scientific literature can capture complex concepts and make
accurate scientific predictions. These models group words that occur in similar contexts,
allowing the identification of relationships between words, such as “cat is to kitten” as “dog is
to puppy.” These capabilities have been demonstrated to recommend promising hypotheses,
such as suggesting materials for functional applications in materials science years before their
discovery.38 Limitations for identifying good hypotheses include inaccuracies in published
scientific findings,39 LLM hallucinations of incorrect information, and poor ability to judge the
novelty of a hypothesis.40 However, tools designed specifically to seek novelty do not face such
limitations.
Experimental design
Experimentation is often an iterative process, which can be time-consuming and inefficient.
AI can collect existing data on a problem—for example, how mutations in an enzyme change
its efficiency—and form a map of promising mutation spaces. AI can then select subsequent
experiments that explore areas with little data and exploit areas that produce promising
results,41 reducing the total number of experiments required and producing better outcomes.
LLMs are currently poor at formulating complex, strategic plans. They tend to have a short
“memory,” meaning they forget the start of a plan as they progress through it. However, AI
“agents” with different interacting modules have shown more promise in achieving this long-
term planning. These agents can describe high-level plans while other more specific models or
tools fill in the details of how to accomplish each step.
www.nti.org 21
The Convergence of Artificial Intelligence and the Life Sciences
Writing software
General-purpose LLMs can successfully write software code to carry out various tasks
but struggle with generating code for more complex or specialized tasks that are not well
represented in the training data. More specialist tools, such as Github Copilot, interact with
users as they are programming, detecting what the user intends to write and automatically
filling it in, which can result in completing projects more quickly.42
Data collection
Machine learning can be incredibly data intensive. Historically collected data may not be
suitable for machine learning if such data were not collected in a standardized way. Some
experts stated that the automation of data production would be key to continuing advances
in our ability to design biology with AI. In some settings, data collection is already routinely
automated—for example, in high-throughput genome sequencing of pathogens by public
health agencies—but this automation is limited in its sophistication.
22 www.nti.org
The Convergence of Artificial Intelligence and the Life Sciences
Biosecurity Implications
The same AI-bio capabilities that will provide significant benefits may
also empower malicious actors to misuse biology to cause harm, with
potentially catastrophic global consequences. Key questions about the
biosecurity implications of these models include:
What are the main concerns about biosecurity risks in the application of AI
or machine learning methods to the life sciences and to engineering living
systems, if any?
What types of AI models and AI biodesign tools carry the biggest risk of
misuse?
www.nti.org 23
The Convergence of Artificial Intelligence and the Life Sciences
LLMs, AI biodesign tools, and AI-enabled AI biodesign tools generally focus on narrow
automated science are likely to change the scientific questions, and they are used by
landscape of biosecurity risks in different ways, researchers and others with significant scientific
depending on the number and type of actors expertise. Although fewer people are likely to
that may use them and the types of capabilities use these tools, some experts believe that they
they confer (see Figure 1). Many experts believe are more likely than LLMs to generate biological
that LLMs could expand the number of people designs for novel toxins, pathogens, or other
able to cause harm with biology. They could help agents that could be more harmful than those
malicious actors become familiar with a range found in nature. However, biodesign tools are
of known biological agents and could provide currently limited in the types of designs they can
resources that help them obtain, construct, or reliably generate, and there is uncertainty about
otherwise develop these agents. However, experts how quickly this will change.
disagree about the implications that this may
Any malicious actor hoping to engineer a biological
have for biosecurity, as some believe that the
agent will face significant hurdles beyond
information provided by LLMs can already be
obtaining a design, including access to biological
obtained in other ways and that malicious actors
components, laboratory infrastructure, and
would need additional skills and resources beyond
laboratory training sufficient to build, test, and
what an LLM could offer.
deploy the designed agent. Experts in AI biodesign
tools also cautioned that the designs created by
these tools require validation. Users need time
FIGURE 1
and expertise to evaluate and optimize the many
candidate designs that these models produce. For
all AI-bio capabilities, the biosecurity implications
depend both on the characteristics of the AI tools
and the resources and abilities of the actors who
Magnitude of harm
24 www.nti.org
The Convergence of Artificial Intelligence and the Life Sciences
Nearly all experts pointed to the possibility that a malicious actor could
use an LLM to obtain information on how to use a toxin, pathogen, or other
biological agent to cause harm. However, experts are divided on how
useful this information might be.
to at-risk individuals.45 For biosecurity-specific is necessary to generate, scale up, and deliver
hazards, experts raised many different types of a harmful biological agent. As described in
concerns. Some believe that LLMs could raise the previous section, LLMs may be helpful for
awareness about potential routes to misuse inexperienced scientists by providing information
biology to cause economic or environmental about laboratory techniques and suggestions
damage, for example, by targeting agriculture when an experiment fails. Future LLMs may be
or vulnerable ecosystems. LLMs could also able to provide more extensive and accurate
exacerbate or create opportunities for feedback that incorporates recorded videos of
misinformation or disinformation, which could experimental procedures and other types of
intersect with biosecurity by undermining public inputs. Still, many experts in laboratory bioscience
confidence in public health efforts, injecting believe that a malicious actor would likely face
false information into pathogen surveillance or hurdles to generating a pathogen that would
response systems, or incorrectly assigning blame require significant resources, infrastructure, and
for causing an epidemic or pandemic. multifaceted expertise to overcome (box 5).
Beyond these broad biosecurity concerns, To get around the challenge of developing
nearly all experts pointed to the possibility that laboratory skills and tacit knowledge, LLMs have
a malicious actor could use an LLM to obtain directed users to opportunities for outsourcing
information on how to use a toxin, pathogen, laboratory experiments and infrastructure
or other biological agent to cause harm. LLMs to contract research organizations and other
could also direct such a person to additional vendors.46 The extent to which a malicious actor
resources or tools helpful for obtaining biological could successfully contract with such external
components, such as pathogen DNA, and getting vendors to facilitate the construction and scale-
up to speed on simple laboratory techniques. up of a harmful biological agent remains unclear.
However, experts are divided on how useful this Still, several experts highlighted this type of LLM
information might be. Some argue that although behavior in calling for additional biosecurity
LLMs can gather information more quickly, they oversight among providers of life sciences
add very little to what has long been possible products and services (see Risk Reduction
by searching the Internet for publicly available Opportunities).
information. In addition, LLMs may “hallucinate”
Experts believe that LLMs will also help people with
incorrect information and present it as true, and
expertise in one area develop expertise in related
individuals without expertise may be unable to
fields. For example, LLMs could help someone with
recognize this misdirection.
some training in molecular biology quickly find
Experts also disagree about the level of tacit relevant literature and important information about
knowledge about laboratory techniques that LLMs virology, including how to generate infectious
can provide and how much of this knowledge agents from non-infectious components. These
www.nti.org 25
The Convergence of Artificial Intelligence and the Life Sciences
users may already have tacit knowledge related agents that are outside of already established
to laboratory techniques and access to laboratory risks. A few experts believe that LLMs could already
infrastructure, and they may be better able to or soon will be able to generate ideas for simple
distinguish useful information from an LLM’s variants of existing pathogens that could be more
incorrect hallucination. These medium- to high- harmful than those that occur naturally, drawing
skilled users could obtain information related to on published research and other sources. Some
biological agents without access to an LLM, but experts also believe that LLMs will soon be able to
LLMs may facilitate the process. access more specialized, open-source AI biodesign
tools and successfully use them to generate a wide
Because current LLMs draw on information that
range of potential biological designs. In this way,
is readily available, most experts believe that
the biosecurity implications of LLMs are linked
they are unlikely to generate designs of biological
with the capabilities of AI biodesign tools.
A malicious actor or small group would face several technical barriers in trying to generate an
infectious agent. A previous NTI report on another type of enabling biotechnology, Benchtop
DNA Synthesis Devices: Capabilities, Biosecurity Implications, and Governance,47 detailed these
hurdles, which include synthesizing pathogen genomes, “booting up” infectious agents from
DNA, and designing successful alterations to pathogens.
LLMs may be able to provide guidance that would help reduce barriers related to synthesizing
pathogen genomes by providing details on DNA sequences to order and simple instructions
on how to combine them into longer stretches of DNA using standard molecular biology
techniques. However, for most pathogens, generating an entire genome would still require
significant expertise and troubleshooting abilities.
Most viral pathogen genomes are not infectious on their own, and making them into viable
pathogens requires laboratory infrastructure and knowledge about how to generate infectious
agents from their genomes. LLMs could help reduce this barrier by bringing together
information on necessary reagents, biological components, and published protocols on how to
do this. One expert believes that recent advances in virology have further reduced this barrier.
26 www.nti.org
The Convergence of Artificial Intelligence and the Life Sciences
AI Biodesign Tools
Current AI biodesign tools generally focus on noted that these tools might not be as useful for
narrow scientific questions, are trained on the more complex task of engineering pathogens
scientific data, and require significant scientific in more fundamental ways. Although other types
expertise to use and to interpret their outputs. of biodesign tools are not yet as reliable as
Some experts believe that in the coming years, protein design tools, they could also be misused.
these tools will be able to provide abstraction For example, tools that facilitate the design or
to biological engineering, reducing the need for optimization of genomes or metabolic pathways
user expertise. As noted earlier, open-source AI could help design or scale up the growth of
biodesign tools could become more accessible by bacteria that produce harmful substances. Some
LLMs, which can help users understand how they experts raised the concern that AI biodesign tools
work. Some companies are developing AI biodesign could be fine-tuned in a variety of ways, trained
tools with access controls for commercial and on pathogen data, or used in combinations to
competitive purposes, but most AI biodesign tools generate designs for pathogens that are more
are currently developed in academic settings and robust, or harmful, or more easily scaled to
either are openly available or have open-access large quantities. Further into the future, some
equivalents. experts believe it will be possible for AI models
to facilitate the design of new types of biological
AI biodesign tools provide insight into biological
agents, expanding the boundaries of what can be
systems that would be very difficult for humans to
accomplished with biology and raising the ceiling
generate on their own, and many experts believe
of potential harms.
that they could be misused by someone aiming
to design toxins, pathogens, or other biological Notwithstanding these risks, as discussed earlier,
agents to cause harm. The potential for misuse there are serious limitations to what AI biodesign
of these tools is often closely related to their tools and their users can accomplish. Experts
intended, benign use. For example, if a design tool familiar with these tools reported that they provide
is capable of minimizing the possibility of harmful many different candidate designs that need
interactions in the human body, it is equally experimental evaluation, which requires time,
capable of maximizing that possibility. Tools resources, and expertise. For example, experts
used for public health purposes to predict which reported that protein design tools are considered
pathogens and their variants have the greatest successful if 20 to 50 percent of their designs
pandemic potential could also create a shortlist meet their intended design criteria and that
of promising candidates for a biological weapon. further work is needed for optimization.
Some experts believe that AI biodesign tools
The complexities of living systems further limit
are more likely than LLMs to generate biological
the ability of these tools to predict the biological
designs for toxins, pathogens, or other biological
consequences of different designs and variations.
agents that are unlikely to evolve naturally or that
For example, it will be relatively easy to predict
may be more harmful than those found in nature.
the effect of a genetic mutation in a single protein
Currently, the most advanced AI biodesign tools on its interaction with another single molecule,
are tools for designing individual proteins, which as compared to the more complex downstream
experts believe could be misused to design novel effects that may arise through a cascade of
toxins; protein domains, to target specific tissues interactions involving many genes and proteins
within the body with toxic elements; or other in the context of a whole cell. Predicting the
harmful proteins such as prions. A few experts consequences of biological designs becomes
www.nti.org 27
The Convergence of Artificial Intelligence and the Life Sciences
even more challenging when adding complexities biosecurity oversight of ordered DNA. Many of
related to interactions with genetically diverse these vendors currently screen customers and DNA
human hosts, transmissibility within large orders to reduce the risk of providing pathogen
populations, and other features of potential or toxin DNA to customers who lack a legitimate
biological weapons (box 6). For this reason, some use for it or inadvertently selling the building
experts are less concerned about the possibility of blocks of dangerous pathogens to malicious
an AI biodesign tool successfully designing wholly actors. Current DNA sequence screening methods
new types of biological agents. Designs for simpler evaluate how similar ordered DNA sequences are
alterations to existing proteins, pathogens, and to known pathogen or toxin DNA. However, new
agents are likely to be more reliable, at least in the AI protein design tools can design proteins that
near term. have very little similarity to known pathogen or
toxin sequences but have the same functions and
Many experts pointed to a near-term and specific
pose the same risks. These tools could allow the
risk: AI protein design tools will make it more
redesign of existing hazards, thus evading DNA
difficult for DNA providers to conduct effective
sequence screening.
Experts in biological weapons pointed out that much of the challenge in developing an effective
weapon is anticipating how it will interact with the complex world it is released into. AI tools are
far from being capable of this level of complex and conceptual analysis.
COVID-19 provides an example. The genome sequence of the SARS-CoV-2 virus became available
early in the pandemic, but this sequence provided insufficient information to enable scientists
to predict transmission routes, pathogenicity, or transmissibility. These traits are determined by
multiple interacting genes belonging to the virus, as well as environmental conditions, genetics,
and immunological characteristics of possible host populations, and a variety of other factors.
Scientists also struggled to predict the course of the pandemic because it depended on public
health responses, the behavior of populations, and other complex social interactions.
Predicting pathogen transmissibility, host range, and virulence from genome sequences using
AI is an active area of research. However, AI tools struggle to generalize to new strains and
assume all sequences come from viable pathogens, which may not be true of a newly designed
strain. They are also data limited; the number of variables the models need to fit is many
orders of magnitude larger than the number of available examples to learn from, so the tools
are limited in what they can successfully infer. Infection outcomes are difficult to measure,
particularly in humans, and laboratory experimental models for mimicking human infection are
currently rudimentary. Efforts to generate data for AI-enabled risk prediction are ongoing, and
high-throughput systems for characterizing the risk posed by viral variants, coupled with AI
analysis of the results, are in development.48
28 www.nti.org
The Convergence of Artificial Intelligence and the Life Sciences
AI tools are already contributing to biosecurity and pandemic preparedness and are likely to
become more integrated into these capabilities in the future.
Biosurveillance
Outbreak reporting tools such as NATHNAC and PulseNet provide data on outbreaks worldwide
to medical professionals and public health teams that could be processed more efficiently
using AI. Public health laboratories are automating their data analysis processes to flag only
concerning trends for human review.50 By analyzing data from returning travelers, AI tools could
model the frequency of infectious disease importation and trace its origins. When combined
with genome sequencing, this approach could be instrumental in identifying outbreaks as well
as uncovering enduring reservoirs of pathogens.51
Many efforts focus on predicting the risk posed by new pathogen strains. AI tools to rapidly
analyze the DNA of pathogens will enable scientists to identify potential pandemic pathogens
and high-risk variants before they spread widely. This knowledge guides the proactive design of
www.nti.org 29
The Convergence of Artificial Intelligence and the Life Sciences
medical countermeasures. AI is also being used to design DNA, RNA,52 and protein53 sequences
that can act as biosensors for detecting dangerous pathogens or toxins.
Medical countermeasures
Many experts who work with AI protein design tools believe they will substantially improve
vaccine and antibody design over the coming years. One expert estimated that these tools could
enable the design of new antibodies based on a pathogen’s genome within days and allow them
to be produced within weeks. Older methods require months to produce antibodies and require
access to patient samples. Experts also estimated that mRNA vaccines could be designed and
deployed in as little as two to three weeks, rapidly stemming outbreaks. AI models can also design
novel antimicrobial drugs, phage therapies, and protective probiotics, though development of
these models is more challenging and these tools are not mature enough to scale widely.
Attribution
Recent promising results suggest that AI tools can identify genetically engineered organisms
and attribute them to their lab of origin.54 These types of tools could help identify actors who
design harmful biological agents and act as a deterrent. However, attribution tools will be less
effective if actors can make design choices that allow them to evade attribution.
AI watermarking is another avenue for attribution, in which models place subtle signatures
in AI model outputs to mark that they were generated by AI. Models could also potentially
place watermarks unique to the model or user, further facilitating attribution. Watermarking
technologies have been considered for biological designs for the purpose of protecting
intellectual property.55 For the scientific community to adopt this approach for marking DNA
or protein sequences, watermarks would need to preserve the biological activity of the desired
product.
30 www.nti.org
The Convergence of Artificial Intelligence and the Life Sciences
What risk reduction measures should we consider that will offer meaningful
protections against the worst risks without unduly hindering scientific
advances and innovation?
What are the most promising options for safeguarding AI-bio capabilities?
What approaches are most likely to work?
www.nti.org 31
The Convergence of Artificial Intelligence and the Life Sciences
There are many opportunities to reduce the infrastructure, or to the data needed to train
risk that AI-bio capabilities could be misused to models. Given how rapidly AI-bio capabilities
cause harm. Some of these are specific to the AI are being developed and the significant
models themselves, including a range of options uncertainty about how they will evolve, many
and suggestions for “guardrails” that describe experts believe that it will be important to have
how AI models could be developed or controlled ongoing opportunities for feedback and iterative
to minimize the risk. Many experts also believe refinement on how guardrails are implemented. A
that it will be critical to bolster biosecurity few experts noted that guardrails for AI biodesign
oversight at the interface where digital designs tools in particular are lacking and that this is an
become physical biological systems, for example, important area for further development.
by strengthening biosecurity frameworks for
Several experts pointed to methods to ensure
DNA synthesis providers and other life sciences
that AI models have appropriate oversight and
vendors. Some proposed solutions are very broad,
incorporate technical safeguards to limit their
including the suggestion to invest further in
potential for misuse. Developers of AI models,
overarching pandemic preparedness and response
including companies and academic researchers,
capabilities. These different ideas are not mutually
could have an institutional review process to
exclusive, and an all-of-the-above, layered defense
ensure that dual-use and ethical considerations
may be needed to reduce risks most effectively.
inform the development and deployment of AI
For each approach, it will be important to balance models. Such oversight mechanisms are already
the need to reduce risks with the need to ensure established or are under active development in
that AI-bio capabilities can be used for beneficial many companies that produce LLMs, but this
purposes. As previously mentioned, many experts approach has not yet been incorporated into the
believe that AI will bring significant benefits for development of AI biodesign tools.
the life sciences broadly, and for biosecurity
and pandemic preparedness specifically. Many
Incorporating Technical Safeguards into
experts also pointed out that no solutions exist
AI Models
that will eliminate all risks related to AI-bio
capabilities. Each of the approaches described in Experts described several technical safeguards
this section should be understood as a hurdle that that some LLM developers are actively
decreases the chances that a malicious actor will implementing to reduce a wide variety of risks,
successfully misuse AI tools to cause biological including those related to biosecurity (box 8).
harm and that this misuse will lead to a global For example, developers can train models
biological catastrophe. using adversarial approaches to discourage the
incorporation of harmful concepts into a model.
Models can also evaluate outputs for harmful
Guardrails for AI Models content before they are shown to the user or
refuse to answer user requests on specific topics.
Experts raised many ideas for guardrails that
Developers can run safety checks at multiple
are already being implemented or that could
stages during the training process to ensure a
be further explored to reduce the risk that AI-
model is safe during its development. It is not yet
bio capabilities are exploited to cause harm.
clear which methods will be most effective, and
These include safeguards built into the models
this is an active area of inquiry. Although technical
themselves, biosecurity evaluations of the models
safeguards have been developed for a wide range
and their technical safeguards, as well as ways to
of AI models, built-in solutions for AI biodesign
control access to the models, to computational
tools are still lacking.
32 www.nti.org
The Convergence of Artificial Intelligence and the Life Sciences
To implement and test many of the technical In addition, experts frequently point out that
safeguards discussed here, model developers technical safeguards incorporated into models
need to understand the types of biosecurity are only feasible for models with access controls,
risks that they should guard against, which may for example, through an application programming
require detailed information about biological interface (API). Malicious actors or others seeking
agents and vulnerabilities. Yet, this can also pose to circumvent these types of safeguards could
a significant challenge because distributing easily strip any safeguards incorporated into
this type of information may be hazardous. open-source models.
Intervention Technical
point safeguard Description
Model Refusals and Refusals are when AI models refuse to follow user
behavior blacklisting requests, typically because the information or action
after requested may be harmful. Alternatively, blacklisting can
deployment prevent models from using specific words or phrases in
their outputs.
www.nti.org 33
The Convergence of Artificial Intelligence and the Life Sciences
Intervention Technical
point safeguard Description
34 www.nti.org
The Convergence of Artificial Intelligence and the Life Sciences
to hallucinate incorrect information and to provide of these proxies. However, some experts cautioned
information with factual errors, expertise is that model safeguards could perform well on proxy
needed to evaluate the outputs of models. data without fully capturing the risks that would
exist with the actual data.
In assessing AI’s capabilities and risks, it is
important to note that most AI models to date are
static. These models are trained once and rely on Monitoring Models
that historical data to answer queries. Dynamic
Experts generally believe that it will be important
models can be continually retrained or augment
to evaluate AI models for their misuse potential
training with an ability to search the web and
prior to their release; however, they have low
provide information that is more current. The
confidence that they can capture all forms of
behavior of models that rely only on their original
misuse, particularly because model capabilities are
training data is easier to anticipate, whereas the
not fully explored prior to their release. Therefore,
behavior of models that change over time or that
a few experts suggested mechanisms to monitor
retrieve information on an ongoing basis can
the behavior of AI models after their release.
be more difficult to anticipate and may limit the
If a model is run on a developers’ computing
effectiveness of evaluation.
infrastructure and made available through an API,
Experts expressed uncertainty and a range of then developers can directly monitor the model’s
opinions about how these evaluations should outputs in response to prompts to determine
be done and what would constitute a “safe” whether the model is providing potentially harmful
AI model. It is also unclear to what extent red- information. To make monitoring of AI models
teaming should test the outputs of these models easier, AI oversight models could monitor model
to determine whether they are genuine risks outputs and flag concerning results for human
or whether the ideas or designs provided will review.
ultimately fail. An in-depth evaluation of the risks
Absent direct control of AI models, AI model
posed by these tools could run into legal and
developers or others could implement reporting
ethical barriers. For example, assessing whether
mechanisms for users to flag concerning
an LLM can acquire a controlled chemical would
behaviors or outputs. For example, systems could
be illegal, and using a benign chemical as a proxy
be established to support public reporting of
would not fully answer the most relevant question.
cases in which an AI model has resulted in harm61
Similarly, evaluators can test whether LLMs and
or reporting of risks directly to the developers or
AI biodesign tools will volunteer suggestions for
a third party. As an incentive to report potential
novel hazardous biological agents, but testing
risks, “bug bounties” could provide financial
whether the designs work might be irresponsible.
rewards to the reporter.62
Testing may involve building an adversarial
agent or set of requests, which would require Controlling Access to Models
knowledge of potential hazards. Experts raised the
concern that a resource of this type could include Many experts believe that controlling access to
information hazards (see page 39) both because AI models is a fundamental strategy to prevent
it may contain specific, high-consequence their misuse. They point out that any technical
risks and because it could act as a roadmap for safeguards that are incorporated into a model to
those seeking to bypass model safeguards. One reduce its potential for misuse can be stripped out
technical solution could be to map sensitive if the full model is released as an open-source tool.
information to less sensitive proxies, and then test Some LLMs and many AI biodesign tools are fully
the model and implement safeguards on the basis open source.
www.nti.org 35
The Convergence of Artificial Intelligence and the Life Sciences
Many larger LLMs, including GPT-4, Claude, and Several experts in academia believe that cultural
PaLM, use APIs that allow the model developer norms supporting open-source AI biodesign tools
to keep the model itself closed while enabling could change, but this change would require
users to enter queries and receive outputs. These awareness-raising and engagement across the
APIs enable the model developer to monitor user academic community, including with publishers
prompts and restrict outputs of potentially harmful and funders. These experts emphasized that the
information. They also enable the developer benefits and drawbacks of restricting access to
to maintain control over the model, including these tools would have to be carefully weighed to
through any of the technical safeguards described ensure that this approach does not limit further
earlier, and to update the model when needed. study, collaboration, or scientific progress.
Some LLM developers restrict access to their Because there is a wide range of biodesign tools,
models to a small group of beta testers early in it will be particularly important to evaluate them
their development as part of a staged release.63 for misuse potential and to ensure that any
This type of staging provides an opportunity for access restrictions are commensurate with the
evaluation and the mitigation of potential risks in risks that they pose. Additionally, restrictions
advance of broad distribution. could disproportionately affect researchers in
low-income countries, raising important equity
Many experts believe that more developers of AI
considerations.
biodesign tools should consider access controls
to reduce the risk of misuse; however, many Some experts suggested that access controls for
of these tools are developed in the academic AI biodesign tools could incorporate customer
community, where strong cultural norms support screening or “Know Your Customer” requirements.
open-source resources. Often, tools are developed For example, developers of these tools could
collaboratively across loosely affiliated groups of restrict model access to individuals who have
people, and requirements for publication include institutional affiliations and reasonable use
the need for peer review, which often includes cases. It is unclear how such a system would
access to published AI models. The expectation for be implemented by the wide range of model
many AI biodesign tools is that they will be used as developers, many of whom are in academia and
a foundation for future work, including alterations are not currently equipped to screen users. A
of the tool itself to improve it and to apply it in centralized credentialing system that verifies
novel ways. Furthermore, one knowledgeable users (as described on page 41) could help in this
expert pointed out that maintaining control of a regard.
model and establishing APIs requires institutional
AI models could also monitor the use of AI-bio
infrastructure that may not be available to many
capabilities and identify concerning behavior
academics.
36 www.nti.org
The Convergence of Artificial Intelligence and the Life Sciences
by users. Several experts were optimistic about they can be used to mine cryptocurrency; the
the ability of AI to analyze patterns of behavior, feasibility of this method is unclear and warrants
such as gathering information from an LLM on further review.65
specific topics combined with purchasing life
However, many experts were broadly pessimistic
sciences products, to identify customers with
about controlling access to computing
potentially malicious intent. A similar project has
infrastructure as a means to reduce biosecurity
demonstrated the value of this type of monitoring
risks. They did not believe that high-performance
of publicly available data for detecting high-risk or
computing infrastructure was needed in order
illicit nuclear trade.64
to build an LLM capable of being misused to
A few experts raised the possibility that cause harm. Many of the largest LLMs are trained
governments could control access to AI models by on supercomputers, but techniques have been
implementing export controls on models that meet developed to fine-tune large models on modest
specific requirements. Owing to the challenge of computing resources, such as a personal laptop.66
restricting access to software tools, experts see Many AI biodesign tools can also be trained with
this approach primarily as a way to slow the spread modest computing resources. Furthermore,
of these tools rather than a means to prevent model developers are actively pursuing methods
their use. Also, it may be difficult to implement to decrease the amount of computational
export controls on tools developed as open-source infrastructure that is needed. Therefore, these
resources, including many biodesign tools. experts believe that computational infrastructure
may not provide a meaningful opportunity for
oversight in the future.
Controlling Access to Computing
Infrastructure
Controlling Access to Data
A small number of experts raised the possibility
of controlling (or monitoring) access to high- A few experts believe that restricting access to
performance computing infrastructure to ensure specialized or particularly harmful data could help
that powerful AI models are developed only by reduce potentially harmful outputs from AI models
responsible users. This infrastructure includes and could prevent bad actors from training their
resources provided by large cloud computing own models. Experts listed a wide range of data,
vendors such as Amazon and Microsoft, as well including, for example, pharmaceutical company
as government-funded infrastructure provided databases on protein and chemical toxicity,
for national research and development efforts. publicly available pathogen genomes, gain-of-
Because training state-of-the-art AI models, function research, and information related to
particularly LLMs, requires large amounts of historical bioweapons programs. They disagree
computational power, access to computational about what types of data should be restricted,
infrastructure may provide an opportunity and many are skeptical about the effectiveness
for overseeing or restricting the training of of controlling access to data for biosecurity
large AI models. For example, to ensure that purposes. Much of the data described are already
model developers are legitimate, access to publicly and redundantly available on the Internet,
these resources could require a license. Cloud and it would be very difficult to prevent some
computing providers could require staged safety types of models, including LLMs, from accessing
checks of AI models or other safeguards as part of such data. One solution could be for AI developers
their usage agreements. Chip manufacturers could to agree not to use publicly available data to train
also impose limits on the hardware itself, similar to models. A suite of resources is now available
how some graphics cards limit the speed at which to verify whether sensitive data were used to
www.nti.org 37
The Convergence of Artificial Intelligence and the Life Sciences
train a model,67 allowing verification of these developers and others familiar with advances in
commitments. AI. Risk assessments of this type are very new,
and multidisciplinary efforts will likely be needed
A few experts believe that restricting access to understand and track how the risk landscape is
to pathogen genome data in particular would changing over time.
unduly hinder legitimate scientific research,
public health, and biosecurity efforts. In addition Some LLM developers have already begun to
to affecting research on pathogens, removing collaborate with biosecurity experts and others to
pathogen data from more general biological evaluate and reduce biosecurity risks related to
data sets would substantially reduce those data the misuse of their models. However, these efforts
sets because an outsized proportion of DNA and are ad hoc, and few opportunities exist for model
protein sequence records in public databases developers to learn from others’ experiences.
originate from pathogens. As a result, the removal Furthermore, awareness and understanding
of access to these data could hamper broader of risks and potential solutions vary widely
efforts, such as development of AI protein design across developers, and the development and
tools or protein structure prediction. Other dissemination of best practices could benefit the
challenges to controlling data for this purpose entire community. A few experts pointed out that
are more systematic. For example, many experts very little information exists about how developers
believe that it would be difficult to decide which of smaller models and those in other parts of the
data should be restricted and who should be world are approaching these issues.
responsible for controlling access. Depending
Experts disagree to some extent about how
on the type of data, legitimate model developers
open collaboration among model developers and
would also need exceptions or ways to access
others should be. A more open process would
the restricted data. All of these questions raise
best ensure participation and engagement of a
important issues of equity and access.
wide range of model developers, including those
from non-Western countries, and from a broader
Coordinating Efforts in AI Guardrail set of experts in biosecurity and related fields.
Development However, the need to successfully develop and
adopt biosecurity safeguards would have to
Many experts, particularly those familiar with
be balanced with the need to limit information
LLMs, pointed to a need for collaboration and
hazards. Furthermore, it is not clear how much AI
diverse expertise to effectively identify and reduce
model developers, particularly those developing
biosecurity risks that might arise from AI models.
LLMs, will be willing to share about their technical
Accurately identifying meaningful risks will require
safeguards because the details of how these
collaboration with a range of experts in synthetic
methods are implemented might reveal proprietary
biology, infectious disease, biosecurity, national
information or raise intellectual property concerns.
security, and other fields—in addition to model
38 www.nti.org
The Convergence of Artificial Intelligence and the Life Sciences
www.nti.org 39
The Convergence of Artificial Intelligence and the Life Sciences
40 www.nti.org
The Convergence of Artificial Intelligence and the Life Sciences
Expanding Customer Screening Expanding the number and type of institutions and
vendors expected to conduct customer screening
An important part of protecting the digital-physical
will be challenging. Customer screening by DNA
interface for biology is screening customers to
providers, for example, is not universal, and
ensure that they have a legitimate use for life
current methods are burdensome, inconsistent,
science products and services. As mentioned
inefficient, and ad hoc. To solve this problem, a
earlier, many DNA providers conduct customer
few experts pointed to methods used by other
screening as part of their biosecurity oversight,72
sectors that have implemented “Know Your
but few, if any, guidelines or requirements exist
Customer” approaches that could be adapted for
for customer screening by other vendors of life
the life sciences.75 Others suggested a centralized
sciences products, services, or infrastructure.
customer verification framework that would give
Furthermore, LLMs have been shown to flag
consumers credentials that they could take to a
opportunities for outsourcing of laboratory skills
range of life sciences and AI model providers.76
and infrastructure, for example, to contract
Centralizing screening would allow a single
research organizations,73 which could unknowingly
organization to perform effective screening
facilitate the development of a harmful biological
rather than depending on many providers to do so
agent. It will be important for these types of
independently. Furthermore, a centralized system
organizations to take biosecurity precautions.
also could make it possible to track a constellation
Several experts highlighted the need to expand of behaviors and purchases made using provided
customer screening practices to parts of the life credentials, providing an opportunity to identify
science supply chain beyond DNA providers.74 concerning patterns indicative of malicious intent.
Other types of providers—such as academic For example, repeated attempts to evade DNA
core facilities, cloud labs, and contract research synthesis screening could be logged, allowing for
organizations—could also adopt customer flagging of penetration testing done in attempts
screening practices. This would help ensure that to access restricted materials. In the future, AI
they do not provide equipment, tools, or services models could be developed to automate many
to illegitimate users who may wish to cause harm. aspects of customer screening.
Some experts recommended identifying which
It is worth noting that implementing a
vendors provide the materials and equipment
credentialing process or centralized screening
most needed for the development of dangerous
system for life sciences practitioners would
biological agents and focusing particular
require significant outreach to those communities
attention on those vendors. A few also mentioned
and could face resistance in a culture that has
strengthening frameworks for sharing of materials
been dedicated to expanding access to the tools
and products among researchers to ensure that
of engineering biology. Although many scientists
products purchased for legitimate purposes were
who work with pathogens or in public health may
not obtained and misused by third parties.
be attuned to the risks and willing to participate,
those who work on broader engineering biology
pursuits may not.
www.nti.org 41
The Convergence of Artificial Intelligence and the Life Sciences
42 www.nti.org
The Convergence of Artificial Intelligence and the Life Sciences
Both industry and academia are developing cutting-edge AI models, but they differ in their
access to resources and the types of models they produce. Industry produces larger, more
computationally intensive models with commercial applications, while academia produces
smaller, less-expensive models that often aim to contribute to building foundational knowledge
in a field. The kinds of models in academia and industry also differ. Only a small number of AI
labs are able to produce large LLMs, whereas both industry and academia produce biodesign
tools and automated science tools.
Industry labs excel at developing large, computationally expensive models. They are able to
undertake projects that require generating and labeling massive data sets, they can build deep
learning models with billions of parameters and millions of dollars in computing costs, and they
can afford to hire top AI talent. These three factors underpin modern advances in LLMs and
are the reason AI industry leaders like OpenAI, Anthropic, Meta, and Google produce the most
capable LLMs. For biodesign tools, and to a lesser extent LLMs, industry also takes on advances
in academia and fine-tunes them for specific commercial projects. However, for competitive
reasons, many companies do not disclose some of their most impressive models and have
strong incentives to keep their methodology and data proprietary. For example, biotech or
pharmaceutical companies are able to generate the proprietary data sets needed to train more
focused biodesign tools and will likely not release data sets that are relevant for commercial
applications. Importantly, the difference in funding between start-ups and established large
companies has a direct effect on the size and scope of the models that each chooses to
construct. Large companies are better positioned to produce large LLMs, and start-ups are
more likely to develop highly specialized, narrow models with specific applications.
Academia can generate a wider range of novel approaches, producing models for expert
domains that are less likely to generate profits. Although academia lags far behind industry
in development of LLMs, many academic AI biodesign tools are on par with or exceed industry
models. Academic models are generally expected to be open source to validate claims about
their performance and make research useful to others.
These differences between large, industry LLM developers and smaller biodesign tool
developers have implications for implementation of guardrails. For example, a few experts
pointed out that government requirements for many of the guardrails listed in this report may
serve the commercial interests of larger LLM developers. At the same time, implementation of
guardrails for specialty biodesign tools is likely to be considered burdensome by developers in
academia and smaller companies.
www.nti.org 43
The Convergence of Artificial Intelligence and the Life Sciences
44 www.nti.org
The Convergence of Artificial Intelligence and the Life Sciences
Many experts believe that developers of AI z Cloud computing vendors can implement
biodesign tools hold some responsibility for how requirements for their use in the development
their tools are used or misused, but potential of AI models and could monitor usage of their
safeguards, controls, and opportunities for resources by AI model developers.
oversight are underdeveloped. Model developers
z Insurers can evaluate risks and determine
may need to work with funders and broader
whether and how the potential for misuse of AI
academic communities to better define their role.
models might affect liability insurance, which
may contribute to the creation of effective
Non-governmental Funders incentives to implement safeguards.
Non-governmental funders might play several z Legal experts can evaluate liability and legal
roles in reducing risks related to the intersection of frameworks to determine how they intersect
AI and the life sciences. Funders, particularly those with AI-bio capabilities and implementation of
who fund academic life sciences research, have guardrails.
the opportunity to shape how AI biodesign tools
are developed and the types of guardrails that are z Institutional review bodies can require
incorporated, as many developers of AI models evaluations of AI models for their potential for
in academia lack awareness, incentives, and misuse and ask about appropriate guardrails for
resources for implementing safeguards. Funders those that pose risks.
could also require evaluations of the research
z Publishers and conferences can evaluate
that they fund to determine whether it will result
whether new AI models should be published in
in AI models with the potential for misuse or will
full or whether a less open approach should be
generate data that could contribute to information
taken.
hazards. A few experts also saw a key role for
non-governmental, philanthropic funding to z Civil society can convene multidisciplinary
support international, collaborative efforts to groups to develop resources, best practices,
develop resources, best practices, and durable and approaches for reducing risks.
norms for responsible AI model development and
dissemination.
www.nti.org 45
The Convergence of Artificial Intelligence and the Life Sciences
Recommendations:
A Proposed Path Forward for
Governance of AI-Bio Capabilities
The application of AI to engineering living systems will have far-reaching
implications that include major potential benefits across many types of
applications—such as the development of vaccines and therapeutics,
broader advances in pandemic preparedness capabilities, and more
fundamental advances in human health and beyond. However, these
same technologies can also be misused to cause a wide range of harms,
potentially including a global biological catastrophe. The rapid pace
of AI advances coupled with accelerating developments in modern
bioscience and biotechnology requires a radically new approach and
a layered defense to reduce associated emerging biological risks.
Effective governance approaches will require focused engagement
by governments, AI model developers, the scientific community, non-
governmental biosecurity organizations, funders, and international fora.
46 www.nti.org
The Convergence of Artificial Intelligence and the Life Sciences
The findings of this report, as noted earlier, are based on interviews and engagement with a wide
range of experts in AI, the life sciences, biosecurity and pandemic preparedness, and other key
areas. The recommendations provided here build on these findings but were developed by the
authors alone and do not necessarily reflect the views of the experts who participated in this
project.
z The Forum should serve as a venue for developing and sharing best practices for
implementing effective AI-bio guardrails, identifying emerging biological risks associated
with ongoing AI advances, and developing shared resources to manage these risks. It should
inform efforts by AI model developers in industry and academia, governments, and the
broader biosecurity community, and it should establish global norms for biosecurity best
practices in these communities.
z Regular meetings of the Forum should provide opportunities to raise concerns, evaluate new
ideas, and develop solutions on an ongoing basis.
z The Forum should be composed of key stakeholders and experts, including AI model
developers in industry and academia and biosecurity experts within government and civil
society, and it should act in concert with other initiatives focused on governance of AI more
broadly.
z The Forum should develop a strategy for managing potential information hazards and
confidential information associated with this work.
z To address emerging risks associated with rapidly advancing AI-bio capabilities, which
can be difficult to anticipate, national governments should establish agile and adaptive
governance approaches that can monitor AI technology developments and associated
biological risks, incorporate private sector input, and rapidly adjust policy. Traditional
regulatory oversight mechanisms are not equipped to match the exponential rate of change
in this field, and many opportunities for risk reduction will depend on implementation by
AI model developers in industry and academia. Government policymakers should explore
www.nti.org 47
The Convergence of Artificial Intelligence and the Life Sciences
z Governments should plan to try multiple types of approaches because some innovative
governance ideas could fail. In addition, governments should incorporate sunsetting
provisions for experimental governance bodies or processes, proactively evaluate successes
and limitations, and update approaches based on lessons learned.
AI model developers should implement the most promising already developed guardrails that
reduce biological risks without unduly limiting beneficial uses. They should collaborate with other
entities, including the AI-Bio Forum described above, to establish best practices and develop
resources to support broader implementation. Governments, biosecurity organizations, and
others should explore opportunities to scale up these solutions nationally and internationally,
through funding, regulations, and other incentives for adoption. Existing guardrails that should
be broadly implemented include the following:
z Methods for AI model users to proactively report hazards. Model developers should
establish ways for users to report when a model has provided potentially harmful biological
information. These reports should contribute to ongoing efforts to evaluate and update
models with improved safeguards even after the model is widely available.
z Technical safeguards to limit harmful outputs from AI models. The state of the art for these
safeguards is likely to change over time. Current promising approaches include training
models to refuse to engage on particular topics or requiring models to provide outputs based
on a “constitution” or set of rules determined by the developer. These should be evaluated
and updated on an ongoing basis as models advance.
z Access controls for AI models. A promising approach for many types of models is the
use of APIs that allow users to provide inputs and receive outputs without access to the
underlying model. Maintaining control of a model ensures that built-in technical safeguards
are not removed and provides opportunities for ensuring user legitimacy and detecting any
potentially malicious or accidental misuse by users.
48 www.nti.org
The Convergence of Artificial Intelligence and the Life Sciences
AI model developers should work with biosecurity experts in government and civil society
to explore additional options for AI model guardrails on an ongoing basis, experimenting
with new approaches, and working to address key open questions and potential barriers to
implementation. Priority areas for exploration include the following:
» Should access to some types of biodesign tools be limited to legitimate users? What types
of models?
» Are there additional barriers to implementing access controls for biodesign tools (e.g.,
funding, infrastructure, or know-how among model developers)? How should these be
overcome?
» What types of incentives would effectively ensure that vendors of cloud computing and
other services enforce requirements for use of their resources?
www.nti.org 49
The Convergence of Artificial Intelligence and the Life Sciences
z Managing access to data needed to train models. It is possible that limiting the availability
of some types of data from being used to train AI models could reduce biological risks.
However, there are many potential benefits and drawbacks to this approach that depend on
the types of data in question. For example, removing publicly available pathogen genome
data from the Internet may be infeasible and, if pursued, could cause more harm than good
by undermining important, beneficial work, such as bioscience research and biosurveillance.
It may be more feasible and effective to manage access to databases that are currently
privately held because of intellectual property or privacy protection needs, such as private
databases linking protein structure to function or databases that include patient medical
data. Key open questions include:
» Are there specific types of data that should not be used or should be used in limited ways
for incorporation into AI models? What types of data? For what types of models?
» How will legitimate model development that uses restricted data be verified?
z Governments, international bodies, and others should work to strengthen DNA synthesis
screening frameworks. This work should include improving incentives for DNA providers
and others to conduct sequence screening and customer screening through establishment
of regulations, funding requirements, financial support for DNA providers that comply,
provision of resources to make screening easier, and support for international bodies
that support DNA synthesis screening practices such as the International Biosecurity and
Biosafety Initiative for Science.
z Governments and other key players should expand requirements and incentives for
customer screening to a wide range of providers of life sciences products, infrastructure,
and services, including cloud labs, contract research organizations, and academic core
facilities. This could include support for a third-party verification system for life sciences
customers.
50 www.nti.org
The Convergence of Artificial Intelligence and the Life Sciences
RECOMMENDATIONS RESPONSIBILITY
Life Science
National AI Model Research Biosecurity International
Governments Developers Community Organizations Funders AI-Bio Forum
Implement existing
guardrails to reduce
the risk of misuse of
AI models.
Perform model
evaluations, such
as red-teaming, to
identify risks before
a model is released.
Implement technical
safeguards in AI
models to limit
harmful outputs.
Implement access
controls for models
with potential for
misuse.
www.nti.org 51
The Convergence of Artificial Intelligence and the Life Sciences
RECOMMENDATIONS RESPONSIBILITY
Life Science
National AI Model Research Biosecurity International
Governments Developers Community Organizations Funders AI-Bio Forum
Explore additional
options for guardrail
development.
Develop additional
or more effective
guardrails,
particularly for
biodesign tools.
Create mechanisms
and principles for
responsibly and
equitably controlling
access to AI models.
Try multiple
governance
approaches and
evaluate their
effectiveness.
52 www.nti.org
The Convergence of Artificial Intelligence and the Life Sciences
RECOMMENDATIONS RESPONSIBILITY
Life Science
National AI Model Research Biosecurity International
Governments Developers Community Organizations Funders AI-Bio Forum
Strengthen biosecurity controls at the interface between digital design tools and physical biological systems.
Work together to
strengthen DNA
synthesis screening
frameworks.
Improve incentives
for DNA providers to
conduct biosecurity
screening of
customers and DNA
sequences.
Expand requirements
and incentives for
customer screening
to a wide range of
providers of life
sciences products
and services.
Increase investment
in pandemic
preparedness and
response, including
the development of
AI-bio capabilities.
www.nti.org 53
The Convergence of Artificial Intelligence and the Life Sciences
Conclusion
The convergence of AI and the life sciences marks a new era for biosecurity and offers
tremendous potential benefits, including for pandemic preparedness and response. Yet,
these rapidly developing capabilities also shift the biological risk landscape in ways that
are difficult to predict and have the potential to cause a global biological catastrophe.
The recommendations in this report provide a proposed path forward for taking action to
address biological risks associated with rapid advances in AI-bio capabilities. Effectively
implementing them will require creativity, agility, and sustained cycles of experimentation,
learning, and refinement.
The world faces significant uncertainty about the future of AI and the life sciences, but it
is clear that addressing these risks requires urgent action, unprecedented collaboration,
a layered defense, and international engagement. Taking a proactive approach will help
policymakers and others anticipate future technological advances on the horizon, address
risks before they fully materialize, and ultimately foster a safer and more secure future.
54 www.nti.org
The Convergence of Artificial Intelligence and the Life Sciences
Appendix A: Participants
Ms. Tessa Alexanian Dr. Nathan Hillson Dr. Michael Montague
Ending Bioweapons Fellow Department Head of BioDesign, Senior Scholar and Research
The Council on Strategic Risks Biological Systems and Engineering Scientist, Center for Health
Division Security
Dr. Sion Bayliss Lawrence Berkeley National Johns Hopkins University
Research Fellow Laboratory
University of Bristol Dr. Sella Nevo
Dr. Stefan A. Hoffmann Senior Information Scientist
Dr. Rocco Casagrande Research Associate, Manchester RAND Corporation
Managing Director Institute of Biotechnology
Gryphon Scientific University of Manchester Ms. Antonia Paterson
Science Manager, Responsible
Dr. Lauren Cowley Dr. John Lees Development and Innovation
Senior Lecturer, Milner Centre Group Leader Google DeepMind
for Evolution European Molecular Biology
University of Bath Laboratory, European Dr. Ryan Ritterson
Bioinformatics Institute Executive Vice President
Dr. James Diggans of Research
(EMBL-EBI)
Distinguished Scientist, Gryphon Scientific
Bioinformatics and Biosecurity Dr. Alan Lowe
Twist Bioscience Associate Professor and Turing Mr. Jonas Sandbrink
Fellow Researcher in Biosecurity
Dr. Kevin Esvelt Oxford University
University College London /
Director, Sculpting Evolution Group
Alan Turing Institute
MIT Media Lab Dr. Clara Schoeder
Dr. Becky Mackelprang Research Group Leader,
Dr. Rob Fergus Institute of Drug Discovery
Associate Director for
Research Director Leipzig University
Security Programs
Google DeepMind
Engineering Biology
Dr. Reed Shabman
Dr. Michal Galdzicki Research Consortium
Deputy Director, Office of
Data Czar Data Science and Emerging
Dr. Jason Matheny
Arzeda Technologies
Chief Executive Officer
RAND Corporation U.S. National Institute of
Dr. John Glass
Allergy and Infectious Diseases
Professor and Leader, Synthetic
Dr. Greg McKelvey
Biology Group Dr. Sarah Shoker
Assistant Director for Biosecurity
J. Craig Venter Institute Research Scientist
U.S. Office of Science and
Technology Policy OpenAI
Dr. Logan Graham
Member of Technical Staff Dr. Lynda Stuart
Dr. Chuck Merryman
Anthropic Executive Director, Institute for
Vice President of Biology
ThinkingNode Life Science Protein Design
University of Washington
www.nti.org 55
The Convergence of Artificial Intelligence and the Life Sciences
Unspecified,
GPT-4 OpenAI No
likely >1 trillion
Pi Inflection Unspecified No
CITATIONS IN
MODEL DESCRIPTION GOOGLE SCHOLAR OPEN SOURCE?
Protein design
Protein structure
RoseTTAFold 2,585 Yes
prediction
Protein structure
AlphaFold-2 15,717 No - requires API
prediction
56 www.nti.org
The Convergence of Artificial Intelligence and the Life Sciences
CITATIONS IN
MODEL DESCRIPTION GOOGLE SCHOLAR OPEN SOURCE?
Protein structure
ProteinMPNN 274 Yes
prediction
Protein structure
ESM-2, ESMFold 407 Yes
prediction
Performs 15 design
xTrimoPGLM and prediction tasks on 3 No
protein sequences
Generates DNA
ExpressionGAN sequences to control the 1,927 Yes
expression of proteins
Generates DNA
DeepMEL sequences to control the 23 Yes
expression of proteins
Biological pathway
novoStoic 82 Yes
design
Biological pathway
RetroPath2 190 Yes
design
Recommends changes
Automatic
in design-build-test- Non-commercial and
Recommendation Tool 140
learn cycles to optimize commercial licenses
(ART)
metabolic engineering
www.nti.org 57
The Convergence of Artificial Intelligence and the Life Sciences
Recommends relevant
ResearchRabbit Search literature research papers for a No
literature search
Identifies promising
MineTheGap Generate hypotheses gaps in the scientific No
literature
Writes software
Microsoft Copilot Write software collaboratively No
with a user
Resource for
Generate hypotheses, representing and
INDRA Yes
interpret results learning from
scientific knowledge
58 www.nti.org
The Convergence of Artificial Intelligence and the Life Sciences
Sabrina Chwalek
Technical Consultant, Global Biological Policy and Programs
Sabrina Chwalek is a technical consultant for Global Biological Policy and Programs at NTI. Chwalek is a
rising senior at Brown University, studying computer science with a focus on artificial intelligence and
machine learning. Previously, Chwalek worked as a research assistant for the Horizon Institute for Public
Service, where she contributed to their efforts to map the biosecurity landscape in U.S. policy and support
the next generation of biosecurity professionals. She also worked for a non-profit focused on promoting
the development of safe AI.
www.nti.org 59
The Convergence of Artificial Intelligence and the Life Sciences
60 www.nti.org
The Convergence of Artificial Intelligence and the Life Sciences
Endnotes
1
Bo Chen et al., “xTrimoPGLM: Unified 100B-Scale Pre- 11
NVIDIA, “NVIDIA Unveils Next-Generation GH200 Grace
trained Transformer for Deciphering the Language of Hopper Superchip Platform for Era of Accelerated Computing
Protein” (preprint, July 14, 2023), bioRxiv, https://doi. and Generative AI,” press release, August 8, 2023, https://
org/10.1101/2023.07.05.547496. nvidianews.nvidia.com/news/gh200-grace-hopper-
superchip-with-hbm3e-memory.
2
Hanchen Wang et al., “Scientific Discovery in the Age of
Artificial Intelligence,” Nature 620, no. 7972 (2023): 47–60, 12
David McCandless, Tom Evans, and Paul Barton, “The Rise
https://doi.org/10.1038/s41586-023-06221-2. and Rise of A.I. Large Language Models (LLMs),” Information
Is Beautiful, July 27, 2023, https://informationisbeautiful.
3
For example, see https://biolm.ai/ui/home/. net/visualizations/the-rise-of-generative-ai-large-language-
4
NTI, “Biosecurity and Risk Reduction Initiative,” https:// models-llms-like-chatgpt/.
www.nti.org/about/programs-projects/project/fostering- 13
Sida Peng et al., “The Impact of AI on Developer Productivity:
biosecurity-innovation-and-risk-reduction/. Evidence from GitHub Copilot,” ArXiv, (2023). https://arxiv.
5
NTI, “Common Mechanism to Prevent Illicit Gene Synthesis,” org/pdf/2302.06590.pdf.
March 22, 2019, https://www.nti.org/analysis/articles/ Noy, Shakked, and Whitney Zhang, “Experimental Evidence on
common-mechanism-prevent-illicit-gene-synthesis/. the Productivity Effects of Generative Artificial Intelligence,”
Sarah R. Carter, Jaime M. Yassif, and Christopher R. Isaac, Science, (2023). https://doi.org/adh2586.
Benchtop DNA Synthesis Devices: Capabilities, Biosecurity Brynjolfsson, Erik, Danielle Li, and Lindsey Raymond,
Implications, and Governance (Washington, DC: NTI, 2023). “Generative AI at Work,” ArXiv, (2023). https://arxiv.org/
NTI, “NTI and World Economic Forum Release New Report on abs/2304.11771.
DNA Synthesis Technologies,” January 9, 2020, https://www. 14
For more information, see the scite website at, https://scite.
nti.org/news/nti-and-world-economic-forum-release-new- ai/.
report-dna-synthesis-technologies/.
For more information, see the Elicit website at, https://elicit.
6
NTI, “International Biosecurity and Biosafety Initiative for org/.
Science (IBBIS),” October 5, 2022, https://www.nti.org/about/ 15
Yevgen Chebotar and Tianhe Yu, “RT-2: New Model Translates
programs-projects/project/international-biosafety-and-
Vision and Language into Action,” Google Deepmind, July
biosecurity-initiative-for-science-ibbis/.
28, 2023, https://www.deepmind.com/blog/rt-2-new-
7
Arul Siromoney and Rani Siromoney, “A Machine Learning model-translates-vision-and-language-into-action?utm_
System for Identifying Transmembrane Domains from Amino source=keywordblog&utm_medium=referral&utm_
Acid Sequences,” Sadhana 21 (1996): 317–25, https://link. campaign=rt2.
springer.com/article/10.1007/BF02745526. Julián N. Acosta, Guido J. Falcone, Pranav Rajpurkar, and Eric
Richard Fox, “Directed Molecular Evolution by Machine J. Topol, “Multimodal Biomedical AI,” Nature Medicine 28, no.
Learning and the Influence of Nonlinear Interactions,” Journal 9 (2022): 1773–84, https://www.nature.com/articles/s41591-
of Theoretical Biology 234, no. 2 (2005): 187–99, https://www. 022-01981-2.
sciencedirect.com/science/article/pii/S0022519304005697. 16
Jason Wei et al., “Chain-of-Thought Prompting Elicits
D. B. Kell, “Metabolomics, Machine Learning and Modelling: Reasoning in Large Language Models” (updated manuscript,
Towards an Understanding of the Language of Cells,” 33, January 10, 2023), arXiv, https://doi.org/10.48550/
no. 3 (June 2005): 520–24, https://portlandpress.com/ arXiv.2201.11903.
biochemsoctrans/article/33/3/520/82871/Metabolomics-
machine-learning-and-modelling. 17
Jason Wei et al., “Chain-of-Thought Prompting Elicits
Reasoning in Large Language Models.”
8
OpenAI, “Introducing ChatGPT,” November 30, 2022, https://
openai.com/blog/chatgpt. 18
John Jumper et al., “Highly Accurate Protein Structure
Prediction with AlphaFold,” Nature 596 (2021): 583–89,
9
Eric Schmidt, “This Is How AI Will Transform the Way Science https://doi.org/10.1038/s41586-021-03819-2.
Gets Done,” MIT Technology Review, July 5, 2023, https://www.
Andriy Kryshtafovych et al., “Critical Assessment of Methods
technologyreview.com/2023/07/05/1075865/eric-schmidt-ai-
of Protein Structure Prediction (CASP)—Round XIV,” Proteins:
will-transform-science/.
Structure, Function, and Bioinformatics 89, no. 12 (2021): 1607–
10
Michael Chui et al., “The Economic Potential of Generative 17, https://onlinelibrary.wiley.com/doi/10.1002/prot.26237.
AI: The Next Productivity Frontier” (McKinsey & Company, 19
Joseph L. Watson et al., “De Novo Design of Protein Structure
New York, NY, June 14, 2023), https://www.mckinsey.com/
and Function with RFdiffusion,” Nature 620 (2023): 1089–1100,
capabilities/mckinsey-digital/our-insights/the-economic-
https://doi.org/10.1038/s41586-023-06415-8.
potential-of-generative-ai-the-next-productivity-frontier.
20
J. Dauparas et al., “Robust Deep Learning–Based Protein
Sequence Design Using ProteinMPNN,” Science 378, no. 6615
(2022): 49–56, https://www.science.org/doi/10.1126/science.
add2187.
www.nti.org 61
The Convergence of Artificial Intelligence and the Life Sciences
21
Ewen Callaway, “Scientists Are Using AI to Dream Up 37
Gemma Conroy, “Scientists Used ChatGPT to Generate an
Revolutionary New Proteins,” news release, Nature, Entire Paper from Scratch—But Is It Any Good?,” news release,
September 15, 2022, https://www.nature.com/articles/ Nature, July 11, 2023, https://www.nature.com/articles/
d41586-022-02947-7. d41586-023-02218-z.
22
Christian Jäckel, Peter Kast, and Donald Hilvert, “Protein 38
Vahe Tshitoyan et al., “Unsupervised Word Embeddings
Design by Directed Evolution,” Annual Review of Biophysics Capture Latent Knowledge from Materials Science Literature,”
37 (2008): 153–73, https://www.annualreviews.org/doi/ Nature 571, no. 7763 (2019): 95–98, https://doi.org/10.1038/
abs/10.1146/annurev.biophys.37.032807.125832. s41586-019-1335-8.
23
Bo Chen et al., “xTrimoPGLM: Unified 100B-Scale Pre- 39
John P. A. Ioannidis, “Why Most Published Research Findings
trained Transformer for Deciphering the Language of Are False,” PLOS Medicine 2, no. 8 (2005): e124, https://doi.
Protein” (preprint, July 14, 2023), bioRxiv, https://doi. org/10.1371/journal.pmed.0020124.
org/10.1101/2023.07.05.547496.
40
Gemma Conroy, “Scientists Used ChatGPT to Generate an
24
Bo Chen et al., “xTrimoPGLM.” Entire Paper from Scratch—But Is It Any Good?,” news release,
Nature, July 11, 2023, https://www.nature.com/articles/
25
Engineering Biology Research Consortium (EBRC), “Review d41586-023-02218-z.
Progress in the Field: An Assessment of Short-Term
Milestones in EBRC’s Roadmap, Engineering Biology,” https:// 41
Yuting Xu et al., “Deep Dive into Machine Learning Models
roadmap.ebrc.org/. for Protein Engineering,” Journal of Chemical Information
and Modeling 60, no. 6 (April 6, 2020): 2773–90, https://doi.
26
For more information, see the Synthetic Biology Open org/10.1021/acs.jcim.0c00073.
Language portal at https://sbolstandard.org/.
42
Eirini Kalliamvakou, “Research: Quantifying GitHub Copilot’s
27
Jan Zrimec et al., “Controlling Gene Expression with Deep Impact on Developer Productivity and Happiness,” GitHub,
Generative Design of Regulatory DNA,” Nature Communications September 7, 2022, https://github.blog/2022-09-07-
13, no. 1 (2022): 1–17, https://doi.org/10.1038/s41467-022- research-quantifying-github-copilots-impact-on-developer-
32818-8. productivity-and-happiness/.
28
Zihao Chen et al., “Artificial Intelligence in Aptamer–Target 43
Daniil Boiko, Robert MacKnight, and Gabe Gomes, “Emergent
Binding Prediction,” International Journal of Molecular Sciences Autonomous Scientific Research Capabilities of Large
22, no. 7 (2021), https://doi.org/10.3390/ijms22073605. Language Models,” arXiv, April 11, 2023, https://arxiv.org/
29
Christopher E. Lawson et al., “Machine Learning for Metabolic pdf/2304.05332.pdf.
Engineering: A Review,” Metabolic Engineering 63 (January 44
For more information, see the INDRA website at http://www.
2021): 34–60, https://doi.org/10.1016/j.ymben.2020.10.005. indra.bio/.
30
Maren Wehrs et al., “Engineering Robust Production Microbes 45
Toby Shevlane et al., “Model Evaluation for Extreme Risks”
for Large-Scale Cultivation,” Trends in Microbiology 27, no. 6 (submitted manuscript, May 24, 2023), arXiv, https://arxiv.org/
(2019): 524–37, https://doi.org/10.1016/j.tim.2019.01.006. abs/2305.15324.
31
Christopher J. Hartline et al., “Dynamic Control in Metabolic 46
Emily H. Soice et al., “Can Large Language Models
Engineering: Theories, Tools, and Applications,” Metabolic Democratize Access to Dual-Use Biotechnology?” (submitted
Engineering 63 (January 2021): 126, https://doi.org/10.1016/j. manuscript, June 6, 2023), arXiv, https://arxiv.org/
ymben.2020.08.015. abs/2306.03809.
32
Linfeng Zhang et al., “Deep Potential Molecular Dynamics: A 47
Sarah R. Carter, Jaime M. Yassif, and Christopher R. Isaac,
Scalable Model with the Accuracy of Quantum Mechanics,” Benchtop DNA Synthesis Devices: Capabilities, Biosecurity
Physical Review Letters 120, no. 14 (2018): 143001, https://link. Implications, and Governance (Washington, DC: NTI, 2023),
aps.org/doi/10.1103/PhysRevLett.120.143001. https://www.nti.org/analysis/articles/benchtop-dna-
33
Ross D. King et al., “The Automation of Science,” Science 324, synthesis-devices-capabilities-biosecurity-implications-and-
no. 5923 (2009): 85–89. 10.1126/science.1165620. governance/.
34
Kevin Williams et al., “Cheaper Faster Drug Development
48
Gavin R. Meehan et al., “Phenotyping the Virulence of SARS-
Validated by the Repositioning of Drugs against Neglected CoV-2 Variants in Hamsters by Digital Pathology and Machine
Tropical Diseases,” Journal of the Royal Society Interface 12, no. Learning” (preprint, August 1, 2023), bioRxiv, https://www.
104 (2015), http://doi.org/10.1098/rsif.2014.1289. biorxiv.org/content/10.1101/2023.08.01.551417v1.
35
Benjamin Burger et al., “A Mobile Robotic Chemist,” Nature
49
Jose Antonio Lanz, “Meet Chaos-GPT: An AI Tool That Seeks to
583, no. 7815 (2020): 237–41, https://www.nature.com/ Destroy Humanity,” Decrypt, April 13, 2023, https://decrypt.
articles/s41586-020-2442-2. co/126122/meet-chaos-gpt-ai-tool-destroy-humanity.
36
Daniil Boiko, Robert MacKnight, and Gabe Gomes, “Emergent
50
European Centre for Disease Prevention and Control,
Autonomous Scientific Research Capabilities of Large “New Tools for Public Health Experts: Outbreak Detection
Language Models,” arXiv, April 11, 2023, https://arxiv.org/ and Epidemiological Reports,” news release, December
pdf/2304.05332.pdf. 17, 2018, https://www.ecdc.europa.eu/en/news-events/
new-tools-public-health-experts-outbreak-detection-and-
epidemiological-reports.
62 www.nti.org
The Convergence of Artificial Intelligence and the Life Sciences
51
Amber Barton and Caroline Colijn, “Genomic, Clinical and 66
Edward J. Hu et al., “LoRA: Low-Rank Adaptation of Large
Immunity Data Join Forces for Public Health,” Nature Reviews Language Models” (revised manuscript, October 16, 2021),
Microbiology 21, no. 639 (2023), https://doi.org/10.1038/ arXiv, https://arxiv.org/abs/2106.09685.
s41579-023-00965-4.
67
Dami Choi, Yonadav Shavit, and David Duvenaud, “Tools
52
Xiaodong Guo et al., “Aptamer-Based Biosensor for Detection for Verifying Neural Models’ Training Data” (submitted
of Mycotoxins,” Frontiers in Chemistry 8 (2020), https://doi. manuscript, July 2, 2023), arXiv, https://arxiv.org/
org/10.3389/fchem.2020.00195. abs/2307.00682.
53
Alfredo Quijano-Rubio et al., “De novo Design of Modular and 68
Bureau of Industry and Security, “Commerce Control List:
Tunable Protein Biosensors,” Nature 591 (2021), https://www. Category 1—Materials, Chemicals, Microorganisms, and
nature.com/articles/s41586-021-03258-z Toxins,” Supplement No. 1 to Part 774, Export Administration
Regulations, August 18, 2023, https://www.bis.doc.gov/index.
54
Ethan C. Alley et al., “A Machine Learning Toolkit for Genetic php/documents/federal-register-notices-1/3315-ccl1-11/file.
Engineering Attribution to Facilitate Biosecurity,” Nature
Export Control Order 2008, U.K. S.I. 2008/3231, December 17,
Communications 11, no. 1 (2020): 1–12, https://doi.org/10.1038/
2008, accessed September 7, 2023, https://www.legislation.
s41467-020-19612-0.
gov.uk/uksi/2008/3231/contents.
55
Francine J. Boonekamp et al., ACS Synthetic Biology 9, 69
For example, International Gene Synthesis Consortium,
no. 6 (May 15, 2020): 1361–75, https://doi.org/10.1021/
https://genesynthesisconsortium.org/; Sarah R. Carter, Jaime
acssynbio.0c00045.
M. Yassif, and Christopher R. Isaac, Benchtop DNA Synthesis
56
Dylan. Matthews, “The $1 Billion Gamble to Ensure AI Doesn’t Devices: Capabilities, Biosecurity Implications, and Governance
Destroy Humanity,” Vox, July 17, 2023, https://www.vox.com/ (Washington, DC: NTI, 2023), https://www.nti.org/analysis/
future-perfect/23794855/anthropic-ai-openai-claude-2. articles/benchtop-dna-synthesis-devices-capabilities-
biosecurity-implications-and-governance/.
57
Yuntao. Bai et al., “Constitutional AI: Harmlessness from AI
Feedback,” (submitted manuscript, December 15, 2022), arXiv, 70
Administration for Strategic Preparedness and Response,
(2023), https://arxiv.org/abs/2212.08073. “Screening Framework Guidance for Providers of Synthetic
Double-Stranded DNA,” U.S. Department of Health and Human
58
John Kirchenbauer et al., “A Watermark for Large Language Services, https://aspr.hhs.gov/legal/syndna/Pages/default.
Models,” (preprint, revised June 6, 2023), arXiv, https://arxiv. aspx.
org/abs/2301.10226; A Watermark for Large Language Models,
demo, https://huggingface.co/spaces/tomg-group-umd/lm- 71
The International Biosecurity and Biosafety Initiative for
watermarking. Science, https://ibbis.bio/.
59
Toby Shevlane, “Sharing Powerful AI Models,” research post, 72
For example, the Administration for Strategic Preparedness
Centre for the Governance of AI, January 20, 2022, https:// and Response, “Screening Framework Guidance for Providers
www.governance.ai/post/sharing-powerful-ai-models. of Synthetic Double-Stranded DNA,” U.S. Department of
Health and Human Services, https://aspr.hhs.gov/legal/
60
Toby Shevlane et al., “Model Evaluation for Extreme Risks” syndna/Pages/default.aspx; International Gene Synthesis
(submitted manuscript, May 24, 2023), arXiv, https://arxiv.org/ Consortium, “Harmonized Screening Protocol© v2.0,”
abs/2305.15324. November 19, 2017, https://genesynthesisconsortium.org/wp-
content/uploads/IGSCHarmonizedProtocol11-21-17.pdf.
61
AI Incident Database, https://incidentdatabase.ai/apps/
incidents/. 73
Emily H. Soice et al., “Can Large Language Models
Democratize Access to Dual-Use Biotechnology?” (submitted
62
OpenAI, “Announcing OpenAI’s Bug Bounty Program,” April 11,
manuscript, June 6, 2023), arXiv, https://arxiv.org/
2023, https://openai.com/blog/bug-bounty-program.
abs/2306.03809.
63
For example, see Anthropic, “Claude 2,” July 11, 2023, https:// 74
Sarah Carter and Diane DiEuliis, “Mapping the Synthetic
www.anthropic.com/index/claude-2.
Biology Industry: Implications for Biosecurity,” Health Security
64
Erin Dumbacher, Page Stoutland, and Jason Arterburn, Signals 17, no. 5 (2019): 403–6, https://doi.org/10.1089/hs.2019.0078.
in the Noise: Preventing Nuclear Proliferation with Machine 75
Dow Jones, “Understanding the Steps of a ‘Know Your
Learning and Publicly Available Information (Washington, DC:
Customer’ Process,” Risk & Compliance Glossary, https://
NTI, 2021), https://www.nti.org/analysis/articles/signals-in-
www.dowjones.com/professional/risk/glossary/know-your-
the-noise-preventing-nuclear-proliferation-with-machine-
customer/.
learning-publicly-available-information/.
76
Global Alliance for Genomics & Health, “Passports,” https://
65
Matt Wuebbling, “GeForce Is Made for Gaming, CMP Is Made
www.ga4gh.org/product/ga4gh-passports/.
to Mine,” NVIDIA, February 18, 2021, https://blogs.nvidia.com/
blog/2021/02/18/geforce-cmp/. 77
Ian Bremmer and Mustafa Suleyman, “The AI Power Paradox:
Yonadav Shavit, “What Does It Take to Catch a Chinchilla? Can States Learn to Govern Artificial Intelligence—Before
Verifying Rules on Large-Scale Neural Network Training via It’s Too Late?” Foreign Affairs, August 16, 2023, https://www.
Compute Monitoring” (revised manuscript, May 30, 2023), foreignaffairs.com/world/artificial-intelligence-power-
arXiv, https://arxiv.org/abs/2303.11341. paradox.
www.nti.org 63
The Convergence of Artificial Intelligence and the Life Sciences
78
Google, “A New Partnership to Promote Responsible AI,” July
26, 2023, https://blog.google/outreach-initiatives/public-
policy/google-microsoft-openai-anthropic-frontier-model-
forum.
White House, “Fact Sheet: Biden-Harris Administration
Secures Voluntary Commitments from Leading Artificial
Intelligence Companies to Manage the Risks Posed by AI,”
press release, July 21, 2023, https://www.whitehouse.
gov/briefing-room/statements-releases/2023/07/21/fact-
sheet-biden-harris-administration-secures-voluntary-
commitments-from-leading-artificial-intelligence-
companies-to-manage-the-risks-posed-by-ai/.
79
Google, “A New Partnership to Promote Responsible AI,” July
26, 2023, https://blog.google/outreach-initiatives/public-
policy/google-microsoft-openai-anthropic-frontier-model-
forum.
64 www.nti.org
Read more about NTI | bio’s work to strengthen
biotechnology governance and biosecurity
www.nti.org/bio
1776 Eye Street, NW • Suite 600 • Washington, DC 20006 • @NTI_WMD • www.nti.org