Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
Advertisement
Marissa Mock is senior director of research at Biologic Therapeutic Discovery, Amgen, Thousand Oaks, California, USA.
You can also search for this author in PubMed Google Scholar
Suzanne Edavettal is associate vice-president for research at Biologic Therapeutic Discovery, Amgen, Thousand Oaks, California, USA.
You can also search for this author in PubMed Google Scholar
Christopher Langmead is director of data sciences at Biologic Therapeutic Discovery, Amgen, Thousand Oaks, California, USA.
You can also search for this author in PubMed Google Scholar
Alan Russell is vice-president for research at Biologic Therapeutic Discovery, Amgen, Thousand Oaks, California, USA.
You can also search for this author in PubMed Google Scholar
Leaps in technology are supporting AI-guided drug design, such as this fully robotic workstation that can purify proteins and move liquids. Credit: Daniel Yoo
You have full access to this article via your institution.
There is a troubling crunch point in the development of drugs made from proteins. Fewer than 10% of such drug candidates succeed in clinical trials1. Failure at this late stage of development costs between US$30 million and $310 million per clinical trial2, potentially costing billions of dollars per drug, and wastes years of research while patients wait for a treatment.
More protein drugs are needed. The large size and surface area of proteins mean that medicines made from them have more ways to interact with target molecules, including proteins in the body that are involved in disease, compared with drugs based on smaller molecules. Protein-based drugs therefore have broad potential as therapeutics.
For instance, protein drugs such as nivolumab and pembrolizumab can prevent harmful interactions between tumour proteins and receptor proteins on immune cells that would deactivate the immune system. Small-molecule drugs, by contrast, are not big enough to come between the two proteins and block the interaction. People with metastatic non-small-cell lung cancer who were treated with conventional therapies have only a 16% chance of surviving for five years or more3. But of those treated with pembrolizumab, 32% survive that long3.

How generative AI is building better antibodies
How generative AI is building better antibodies
Because proteins can have more than one binding domain, therapeutics can be designed that attach to more than one target — for instance, to both a cancer cell and an immune cell4. Bringing the two together ensures that the cancer cell is destroyed.
To unblock the drug-development bottleneck, computer models of how protein drugs might act in the body must be improved. Researchers need to be able to judge the dose that drugs will work at, how they will interact with the body’s own proteins, whether they might trigger an unwanted immune response, and more.
Making better predictions about future drug candidates requires gathering large amounts of data about why previous ones succeeded or failed during clinical trials. Data on many hundreds or thousands of proteins are needed to train effective machine-learning models. But even the most productive biopharmaceutical companies started clinical trials for just 3–12 protein therapeutics per year, on average, between 2011 and 2021 (see go.nature.com/3rclacp). Individual pharmaceutical companies, such as ours (Amgen in Thousand Oaks, California), cannot amass enough data alone.
Incorporation of artificial intelligence (AI) into drug-development pipelines can help. It offers an opportunity for competing companies to merge data while protecting their commercial interests. Doing so can improve developers’ predictive abilities, benefiting both the firms and the patients.
Drug development is labour-intensive and time-consuming. Until about five years ago, developing a candidate required several cycles of protein engineering to turn a natural protein into a working drug5. Proteins were selected for a desired property, such as an ability to bind to a particular target molecule. Investigators made thousands of proteins and rigorously tested them in vitro before selecting one lead candidate for clinical trials. Failure at any stage meant starting the process from scratch (see ‘Changing drug-discovery pipelines’).
Source: M. Mock et al.
Biopharmaceutical companies are now using AI to speed up drug development. Machine-learning models are trained using information about the amino-acid sequence or 3D structure of previous drug candidates, and about properties of interest. These characteristics can be related to efficacy (which molecules the protein bind to, for instance), safety (does it bind to unwanted molecules or elicit an immune response?) or ease of manufacture (how viscous is the drug at its working concentration?).
Once trained, the AI model recognizes patterns in the data. When given a protein’s amino-acid sequence, the model can predict the properties that the protein will have, or design an ‘improved’ version of the sequence that it estimates will confer a desired property. This saves time and money trying to engineer natural proteins to have properties, such as low viscosity and a long shelf life, that are essential for drugs. As predictions improve, it might one day become possible for such models to design working drugs from scratch.
Technological advances are also helping laboratory experiments to keep pace with AI-guided drug design. Fully robotic workstations independently move liquids, grow cells and load analytical instruments. Miniaturized technologies can perform assays using tiny amounts of material. Together, these improvements allow more proteins to be tested simultaneously, so developers can generate extra data to train machine-learning algorithms and efficiently screen the candidates produced by the models.

For chemists, the AI revolution has yet to happen
For chemists, the AI revolution has yet to happen
In short, this fusion of cutting-edge life science, high-throughput automation and AI — known as generative biology — has drastically improved drug developers’ ability to predict a protein’s stability and behaviour in solution. Our company now spends 60% less time than it did five years ago on developing a candidate drug up to the clinical-trial stage.
But properties related to a drug’s behaviour in the body are still proving to be unpredictable, particularly for complex drugs with several targets. Companies lack the data to accurately model these behaviours because, unlike most in vitro tests, clinical trials provide limited information. Data on many hundreds or thousands of proteins are needed to train effective machine-learning models.
To amass enough data, biopharmaceutical companies need to share information on the physical properties of specific amino-acid sequences, the molecules that the proteins target and how the drugs act in the body. However, these data are also the commercial assets that enable a developer to bring a therapeutic to market at a competitive speed.
Two specialized approaches to machine learning could provide a way forward, enabling companies to pool their resources without revealing competitive data.
Once trained, machine-learning models can be updated as and when more data become available. With ‘federated learning’6, separate parties update a shared model using data sets without sharing the underlying data.
Here’s how federated learning could work for biopharmaceutical companies. A trusted party — perhaps a technology firm or a specialized consulting company — would maintain a ‘global’ model, which could initially be trained using publicly available data7,8. That party would send the global model to each participating biopharmaceutical company, which would update it using the firm’s own data to create a new ‘local’ model. The local models would be aggregated by the trusted party to produce an updated global model. This process could be repeated until the global model essentially stopped learning new patterns.
Antibody therapies are an example of protein drugs that are used in the clinic.Credit: Garo/Phanie/Science Photo Library
MELLODDY, a federated-learning project for small-molecule drugs that we were part of, shows that this approach works (www.melloddy.eu). For this project, Amgen and nine other pharmaceutical companies trained shared federated-learning models for three years, using pharmacological and toxicological data for more than 21 million small-molecule drug candidates9. All ten partners could better predict the properties of small molecules using the shared model than they could using their own existing ones. The size of the improvement varied depending on the property being predicted, but ranged from just under 1% to 20%, and the companies saw different levels of improvement for each property. Most companies improved their ability to predict how small molecules will be absorbed, distributed, metabolized and excreted by the human body by more than 10% — precisely the type of information that is most needed for protein therapeutics.
The reduced molecular complexity of small molecules meant that it made sense to pilot federated learning with these drugs. We expect the approach to deliver even bigger improvements for protein drugs. For MELLODDY, each company’s existing machine-learning models had already been trained on plentiful data — millions of small molecules — so there was perhaps little to be gained by adding more data through shared models. Biopharmaceutical companies have much less starting information about protein drugs, leaving more room for improvement.
Developers can get more bang for their buck by fine-tuning the data they must generate to improve their model.
This ‘active learning’ approach exploits the fact that a machine-learning model can detect an unusual input — an amino-acid sequence that is very different from those in its training data, say — and can alert the user that its predictions for that input are unreliable.
With active learning, an algorithm determines the training data that would be needed to make more-reliable predictions about this type of unusual amino-acid sequence. Rather than developers having to guess what extra data they need to generate to improve their model, they can build and analyse only proteins with the requested amino-acid sequences.

Allow patents on AI-generated inventions — for the good of science
Allow patents on AI-generated inventions — for the good of science
Active learning is already being used by biopharmaceutical companies10. It should now be combined with federated learning to improve predictions — particularly for more-complex properties, such as how a protein’s sequence or structure determines its interactions with the immune system. Antibodies provide a good starting point for this endeavour, because they are the most common type of protein drug and therefore have the most data available. Federated learning could be used to pool the information on the antibodies that each company has developed or tested in clinical trials. Active learning would then reveal a tractable set of antibody sequences worth characterizing to improve the model’s predictive abilities. The sequences could be selected from the Observed Antibody Space database11, a public repository in which the amino-acid sequences of more than one billion naturally occurring antibodies are listed. Using publicly available sequences eliminates the risk of revealing proprietary drug targets.
Protein-drug developers have yet to take the steps needed to make federated and active learning work. We encourage biopharmaceutical companies to form a consortium that shares access to a federated- and active-learning platform. From our experiences with MELLODDY, we think that the following considerations will be key to enabling collaborative competition.
Together, participants must choose a platform for their models. Technology companies have already built industry-agnostic infrastructure to enable federated learning (such as NVIDIA FLARE; go.nature.com/3pa8qwr). A technology or consulting firm should be jointly approved by all participants to be a trusted third party for the shared global model.
The cost of collaboration should be low. Investment is needed to format historical data sets for use by machine-learning models, acquire new data requested by active-learning algorithms, install and run software and for legal advice. But this investment equates to a fraction of the cost of developing a drug using conventional methods, especially given that models produced by collaboration should make future drug-development efforts cheaper.
The biggest challenge will lie in deciding precisely which measurements and metrics the consortium should share. We propose that pharmacological and stability data from in vitro tests and data from clinical trials should be in scope for sharing, with a focus on predicting properties that will provide maximum benefit to people. Companies should commit to expanding their clinical measurements to include factors known to affect whether someone has an immune response to a drug.
These data are highly sensitive, so it is essential that contributors can protect their competitive interests. We suggest that each founding member of the consortium shares a minimum amount of data as a condition of accessing the platform. Once initial models have been trained, active learning would provide a mechanism to calculate the current value of the model, and new participants would join the consortium by contributing data sets that add a set value.
On the basis of our experience with MELLODDY, we expect that there will be differences in the improvements each participant sees. Some companies might see the biggest advance in their ability to predict viscosity, others in predicting drug metabolism, for instance. But all participants should ultimately find that they can develop medicines faster and at lower cost — we expect this to be enticement enough to draw companies in.
We are standing at a tipping point in drug development. Behind us are the slow and iterative methods by which a protein found in nature is gradually moulded into a drug. Ahead is the possibility of generative biology being harnessed for computational development of multi-specific protein drugs. We call on our peers to collaborate to accelerate the arrival of this exciting future.
Nature 621, 467-470 (2023)
doi: https://doi.org/10.1038/d41586-023-02896-9
Thomas, D. et al. Clinical Development Success Rates and Contributing Factors 2011–2020 (BIO, QLS & Informa, 2021).
Google Scholar
Fernando, K. et al. Drug Discov. Today 27, 697–704 (2022).
Article PubMed Google Scholar
Reck, M. et al. J. Clin. Oncol. 39, 2339–2349 (2021).
Article PubMed Google Scholar
Deshaies, R. J. Nature 580, 329–338 (2020).
Article PubMed Google Scholar
Mak, K.-K. & Pichika, M. R. Drug Discov. Today 24, 773–780 (2019).
Article PubMed Google Scholar
Rieke, N. et al. NPJ Digit. Med. 3, 119 (2020).
Article PubMed Google Scholar
Raybould, M. I. J. et al. Proc. Natl Acad. Sci. USA 116, 4025–4030 (2019).
Article PubMed Google Scholar
Jain, T. et al. Proc. Natl Acad. Sci. USA 114, 944–949 (2017).
Article PubMed Google Scholar
Heyndrickx, W. et al. Preprint at ChemRxiv https://doi.org/10.26434/chemrxiv-2022-ntd3r (2022).
Bailey, M. et al. Preprint at bioRxiv https://doi.org/10.1101/2023.07.26.550653 (2023).
Olsen, T. H., Boyles, F. & Deane, C. M. Protein Sci. 31, 141–146 (2022).
Article PubMed Google Scholar
Download references
Reprints and Permissions
All co-authors are employees and stockholders of Amgen, Inc.
How generative AI is building better antibodies
For chemists, the AI revolution has yet to happen
Allow patents on AI-generated inventions — for the good of science
China’s data-driven dream to overhaul health care
Psychedelic drug MDMA moves closer to US approval following success in PTSD trial
News
Life-changing cystic fibrosis treatment wins US$3-million Breakthrough Prize
News
Four ways research aims to outwit cancer’s evasion tactics
Nature Index
AlphaFold tool pinpoints protein mutations that cause disease
News
A test of artificial intelligence
Outlook
Why Japan is building its own version of ChatGPT
News
Can cancer research shift its focus?
Nature Index
A guide to the Nature Index
Nature Index
Measures to ensure clinical trials are trustworthy
Correspondence
Otto von Guericke University Magdeburg (OVGU) is an internationally oriented and regionally networked university with a strong research profile. …
Magdeburg, Sachsen-Anhalt (DE)
Otto-von-Guericke-Universität
The Goethe University Frankfurt am Main invites applications for the position of Professor (W3) of Pharmaceutical Biology in the Institute of Pharm…
Frankfurt am Main, Hessen (DE)
WESTPRESS GmbH & Co. KG
The Multiscale Research Institute for Complex Systems (MRICS) at Fudan University is located at the Zhangjiang Campus of Fudan University.
Shanghai, China
Fudan University
Houston, Texas (US)
Baylor College of Medicine (BCM)
Houston, Texas (US)
Baylor College of Medicine (BCM)
You have full access to this article via your institution.
How generative AI is building better antibodies
For chemists, the AI revolution has yet to happen
Allow patents on AI-generated inventions — for the good of science
China’s data-driven dream to overhaul health care
An essential round-up of science news, opinion and analysis, delivered to your inbox every weekday.
Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.
© 2023 Springer Nature Limited