How generative AI is building better antibodies

Thu 04 May 2023
nature
Ewen Callaway

Illustration of two different therapeutic monoclonal antibodies binding to the SARS-CoV-2 virus spike protein — Monoclonal antibodies (y-shaped) binding to sites on a SARS-CoV-2 virus spike protein (red) (artist’s conception). Credit: Juan Gaertner/Science Photo Library

At the height of the pandemic, researchers raced to develop some of the first effective treatments against COVID-19: antibody molecules isolated from the blood of people who had recovered from the disease.

Now, scientists have shown that generative artificial intelligence (AI) can provide a shortcut through some of this laborious process, suggesting sequences that boost the potency of antibodies against viruses such as SARS-CoV-2 and ebolavirus. A study published last week in Nature Biotechnology ¹ is part of growing efforts to apply ‘neural networks’ similar to those behind the ChatGPT AI platform to antibody design.

Antibody drugs for diseases including breast cancer and rheumatoid arthritis bring in more than US$100 billion in worldwide sales each year. Researchers hope that generative AI — neural networks that can create text, images and other content on the basis of learnt patterns — will speed up development and help to unlock antibody drugs for targets that have resisted conventional design approaches.

“There’s intense interest in discovering and engineering antibodies, and how one makes antibodies better,” says Peter Kim, a biochemist at Stanford University in California, who co-led the Nature Biotechnology paper.

Immune weapons

Antibodies are among the immune system’s key weapons against infection. The proteins have become a darling of the biotechnology industry, in part because they can be engineered to attach to almost any protein imaginable to manipulate its activity. But generating antibodies with useful properties and improving on these involves “a lot of brute-force screening”, says Brian Hie, a computational biologist at Stanford who also co-led the study.

To see whether generative AI tools could cut out some of the grunt work, Hie, Kim and their colleagues used neural networks called protein language models. These are similar to the ‘large language models’ that form the basis of tools such as ChatGPT. But instead of being fed vast volumes of text, protein language models are trained on tens of millions of protein sequences.

Other researchers have used such models to design completely new proteins , and to help predict the structure of proteins with high accuracy . Hie’s team used a protein language model — developed by researchers at Meta AI, a part of tech giant Meta based in New York City — to suggest a small number of mutations for antibodies.

The model was trained on only a few thousand antibody sequences, out of the nearly 100 million protein sequences it learnt from. Despite this, a surprisingly high proportion of the model’s suggestions boosted the ability of antibodies against SARS-CoV-2, ebolavirus and influenza to bind to their targets.

Alterations to a therapy approved to treat Ebola and a COVID-19 treatment bettered these molecules’ ability to recognize and block the proteins these viruses use to infect cells. (The COVID-19 antibody is not effective against Omicron and its subvariants, and the AI-guided changes are unlikely to restore effectiveness, Hie says.)

Many of the suggested changes to antibodies occur outside the regions of the protein that interact with its target, which are usually the focus of engineering efforts, says Kim. “The model is reaching to information which is completely, or largely, non-obvious to even the experts in antibody engineering,” he adds. “To me that’s the ‘holy cow, what’s going on here?’ moment.”

Completely new proteins

“This is a tool people will use to improve their antibodies,” says Charlotte Deane, an immuno-informatics researcher at the University of Oxford, UK. “I think it’s really cool.” But she adds that many researchers hope that, instead of simply improving existing antibodies, generative AI will be able to create entirely new ones that will bind to a target of choice.

This ability could help researchers to develop drugs for molecular targets that have resisted other antibody-design approaches, says Surge Biswas, co-founder of Nabla Bio, a company in Boston, Massachusetts, that is working on the challenge.

For example, AI could help to tackle G-protein-coupled receptors, a family of proteins sandwiched into cell membranes that are involved in neurologic disorders, heart disease and myriad other conditions. Generative AI could also help in the design of antibody drugs that are able to latch onto multiple targets, such as a tumour protein and an immune cell that can the kill that tumour, says Biswas.

Possu Huang, a bioengineer at Stanford, says that protein language models are powerful and very good at optimizing existing proteins, including antibodies. But models trained only on protein sequences might struggle to come up with truly new antibodies that recognize a specified protein.

Researchers say they are making progress. In March, scientists at Absci, a biotech firm in Vancouver, Washington, reported what they say is a first step towards making new antibodies with AI, in a preprint posted on the bioRxiv server ² . Using a model incorporating protein sequences, as well as experimental data, they generated new designs for several important regions of an antibody drug used to treat breast cancer.

A key challenge in designing completely new antibodies is that their ability to recognize a particular target depends on floppy loops in the antibody structure. These interactions have proved difficult to model with AI, say researchers.

Last year, Huang’s team developed ³ a generative AI tool that can create proteins able to bind strongly to a specified target — in one case, snake venoms — using such loops. The same approach could help to make new antibodies, Huang says, but it might require more data than are currently available about how antibodies interact with their targets.

“I don’t think anybody’s really figured this out,” adds Biswas.

article_text: At the height of the pandemic, researchers raced to develop some of the first effective treatments against COVID-19: antibody molecules isolated from the blood of people who had recovered from the disease. Now, scientists have shown that generative artificial intelligence (AI) can provide a shortcut through some of this laborious process, suggesting sequences that boost the potency of antibodies against viruses such as SARS-CoV-2 and ebolavirus. A study published last week in Nature Biotechnology1 is part of growing efforts to apply ‘neural networks’ similar to those behind the ChatGPT AI platform to antibody design. Antibody drugs for diseases including breast cancer and rheumatoid arthritis bring in more than US$100 billion in worldwide sales each year. Researchers hope that generative AI — neural networks that can create text, images and other content on the basis of learnt patterns — will speed up development and help to unlock antibody drugs for targets that have resisted conventional design approaches. “There’s intense interest in discovering and engineering antibodies, and how one makes antibodies better,” says Peter Kim, a biochemist at Stanford University in California, who co-led the Nature Biotechnology paper. Antibodies are among the immune system’s key weapons against infection. The proteins have become a darling of the biotechnology industry, in part because they can be engineered to attach to almost any protein imaginable to manipulate its activity. But generating antibodies with useful properties and improving on these involves “a lot of brute-force screening”, says Brian Hie, a computational biologist at Stanford who also co-led the study. To see whether generative AI tools could cut out some of the grunt work, Hie, Kim and their colleagues used neural networks called protein language models. These are similar to the ‘large language models’ that form the basis of tools such as ChatGPT. But instead of being fed vast volumes of text, protein language models are trained on tens of millions of protein sequences. Other researchers have used such models to design completely new proteins, and to help predict the structure of proteins with high accuracy. Hie’s team used a protein language model — developed by researchers at Meta AI, a part of tech giant Meta based in New York City — to suggest a small number of mutations for antibodies. The model was trained on only a few thousand antibody sequences, out of the nearly 100 million protein sequences it learnt from. Despite this, a surprisingly high proportion of the model’s suggestions boosted the ability of antibodies against SARS-CoV-2, ebolavirus and influenza to bind to their targets. Alterations to a therapy approved to treat Ebola and a COVID-19 treatment bettered these molecules’ ability to recognize and block the proteins these viruses use to infect cells. (The COVID-19 antibody is not effective against Omicron and its subvariants, and the AI-guided changes are unlikely to restore effectiveness, Hie says.) Many of the suggested changes to antibodies occur outside the regions of the protein that interact with its target, which are usually the focus of engineering efforts, says Kim. “The model is reaching to information which is completely, or largely, non-obvious to even the experts in antibody engineering,” he adds. “To me that’s the ‘holy cow, what’s going on here?’ moment.” “This is a tool people will use to improve their antibodies,” says Charlotte Deane, an immuno-informatics researcher at the University of Oxford, UK. “I think it’s really cool.” But she adds that many researchers hope that, instead of simply improving existing antibodies, generative AI will be able to create entirely new ones that will bind to a target of choice. This ability could help researchers to develop drugs for molecular targets that have resisted other antibody-design approaches, says Surge Biswas, co-founder of Nabla Bio, a company in Boston, Massachusetts, that is working on the challenge. For example, AI could help to tackle G-protein-coupled receptors, a family of proteins sandwiched into cell membranes that are involved in neurologic disorders, heart disease and myriad other conditions. Generative AI could also help in the design of antibody drugs that are able to latch onto multiple targets, such as a tumour protein and an immune cell that can the kill that tumour, says Biswas. Possu Huang, a bioengineer at Stanford, says that protein language models are powerful and very good at optimizing existing proteins, including antibodies. But models trained only on protein sequences might struggle to come up with truly new antibodies that recognize a specified protein. Researchers say they are making progress. In March, scientists at Absci, a biotech firm in Vancouver, Washington, reported what they say is a first step towards making new antibodies with AI, in a preprint posted on the bioRxiv server2. Using a model incorporating protein sequences, as well as experimental data, they generated new designs for several important regions of an antibody drug used to treat breast cancer. A key challenge in designing completely new antibodies is that their ability to recognize a particular target depends on floppy loops in the antibody structure. These interactions have proved difficult to model with AI, say researchers. Last year, Huang’s team developed3 a generative AI tool that can create proteins able to bind strongly to a specified target — in one case, snake venoms — using such loops. The same approach could help to make new antibodies, Huang says, but it might require more data than are currently available about how antibodies interact with their targets. “I don’t think anybody’s really figured this out,” adds Biswas. vocabulary:

{'antibody': '抗体：一种蛋白质，是免疫系统对抗感染的关键武器，可以被设计为附着在几乎任何蛋白质上以操纵其活性','neural networks': '神经网络：一种计算模型，可以根据学习的模式创建文本、图像和其他内容','generative AI': '生成式人工智能：一种神经网络，可以根据学习的模式创建文本、图像和其他内容','protein language models': '蛋白质语言模型：一种神经网络，可以从数以百万计的蛋白质序列中学习','large language models': '大型语言模型：一种神经网络，可以从大量文本中学习','ChatGPT': 'ChatGPT：一种基于大型语言模型的AI平台','G-protein-coupled receptors': 'G蛋白偶联受体：一种夹在细胞膜中的蛋白质家族，参与神经紊乱、心脏病等多种疾病','bioRxiv': 'bioRxiv：一个生物学预印本服务器','snake venoms': '蛇毒：一种有毒的蛋白质，由毒蛇分泌','GPT': 'GPT：一种大型语言模型，可以从大量文本中学习'} readguide:

{'reading_guide': '本文讲述了研究人员如何利用生成式人工智能（AI）来加速抗体药物的开发，以抗击新冠病毒等病毒。文章指出，AI可以提出一些改变抗体的建议，从而提高其对病毒的抵抗力，但是AI也存在一些挑战，比如如何设计完全新的抗体。'} long_sentences:

{'sentence 1': 'Antibody drugs for diseases including breast cancer and rheumatoid arthritis bring in more than US$100 billion in worldwide sales each year.', 'sentence 2': 'Last year, Huang’s team developed3 a generative AI tool that can create proteins able to bind strongly to a specified target — in one case, snake venoms — using such loops.'}

Sentence 1: 这句话描述了抗体药物在治疗乳腺癌和类风湿性关节炎等疾病方面的应用，每年在全球范围内带来超过1000亿美元的销售额。句子结构上，主干是“抗体药物带来超过1000亿美元的销售额”，其中“抗体药物”是主语，“带来超过1000亿美元的销售额”是谓语，“治疗乳腺癌和类风湿性关节炎等疾病”是定语，“全球范围内”是状语。语义上，这句话表达的是抗体药物在治疗乳腺癌和类风湿性关节炎等疾病方面的应用，每年在全球范围内带来超过1000亿美元的销售额。

Sentence 2: 这句话描述了黄士团队去年开发的一种生成式人工智能工具，可以利用这些环节创造能够强烈结合指定目标的蛋白质，其中一个案例是蛇毒。句子结构上，主干是“黄士团队开发的一种生成式人工智能工具可以利用这些环节创造能够强烈结合指定目标的蛋白质”，其中“黄士团队开发的一种生成式人工智能工具”是主语，“可以利用这些环节创造能够强烈结合指定目标的蛋白质”是谓语，“蛇毒”是宾语。语义上，这句话表达的是黄士团队去年开发的一种生成式人工智能工具，可以利用这些环节创造能够强烈结合指定目标的蛋白质，其中一个案例是蛇毒。