BioPharmaTrend
Latest News
All Topics
  • AI in Bio
  • Tech Giants
  • Next-Gen Tools
  • Biotech Ventures
  • Industry Movers
Interviews
Companies
  • Company Directory
  • Sponsored Case Studies
  • Create Company Profile
More
  • About Us
  • Our Team
  • Advisory Board
  • Citations and Press Coverage
  • Partner Events Calendar
  • Advertise with Us
  • Write for Us
Newsletter
Login/Join
  • AI in Bio
  • Tech Giants
  • Next-Gen Tools
  • Biotech Ventures
  • Industry Movers

  AI in Bio

The Growing Momentum for AI Foundation Models in Biotech and 12 Notable Companies

by Andrii Buvailo, PhD  (contributor )   •   March 17, 2024  

Disclaimer: All opinions expressed by Contributors are their own and do not represent those of their employers, or BiopharmaTrend.com.
Contributors are fully responsible for assuring they own any required copyright for any content they submit to BiopharmaTrend.com. This website and its owners shall not be liable for neither information and content submitted for publication by Contributors, nor its accuracy.

Share:   Share in LinkedIn  Share in Bluesky  Share in Reddit  Share in Hacker News  Share in X  Share in Facebook  Send by email   |  

As artificial intelligence (AI) foundation models grow increasingly capable, they become useful for applications across a wide range of economic functions and industries, including biotech.

The most prominent examples of general purpose foundation models are the GPT-3 and GPT-4 models, which form the basis of ChatGPT, and BERT, or Bidirectional Encoder Representations from Transformers.

These are gigantic models trained on enormous volumes of data, often in a self-supervised or unsupervised manner (without the need for labeled data). 

Thanks to special model design, including transformer architecture and attention algorithms, foundation models are inherently generalizable, allowing their adaptation to a diverse array of downstream tasks, unlike traditional AI models that excel in single tasks like, say, predicting molecule-target interaction.

The "foundation" aspect comes from their generalizability: once pre-trained, they can be fine-tuned with smaller, domain-specific datasets to excel in specific tasks, reducing the need for training new models from scratch. This approach enables them to serve as a versatile base for a multitude of applications, from natural language processing to bioinformatics, by adapting to the nuances of particular challenges through additional training.

#advertisement
AI in Drug Discovery Report 2025

Foundation models in bio

A number of companies are racing towards building more domain-specific foundation models, with even more accuracy and relevance than all-purpose models. 

For instance, in September 2023, Deep Genomics unveiled BigRNA, a pioneering AI foundation model for uncovering RNA biology and therapeutics. 

According to Deep Genomics, it is the first transformer neural network engineered for transcriptomics. BigRNA is informed by nearly two billion adjustable parameters and has been trained on thousands of datasets, totaling over a trillion genomic signals.

A month earlier, Ginkgo Bioworks and Google Cloud announced a 5-year partnership where Ginkgo would work to develop new, state-of-the-art large language models (LLMs). 

The Ginkgo’s AI foundation model would be focused on genomics, protein function, and synthetic biology and would be running on Google Cloud's Vertex AI platform. The model is supposed to help Ginkgo's customers accelerate innovation and discovery in fields as diverse as drug discovery, agriculture, industrial manufacturing, and biosecurity.

In February 2024, Ginkgo committed even further to building next-generation biological foundation models by the acquisition of key assets of Reverie Labs, a startup that builds and uses AI/ML tools to accelerate drug discovery.

Ginkgo has acquired Reverie's infrastructure and software for training large-scale AI foundation models and four of Reverie's key AI team members will join Ginkgo. 

In February 2024, Bioptimus, a new entrant in the biotech sector based in France, announced the successful closure of a $35 million seed funding round to build an AI foundation model for biology, targeting advancements across the entire biological spectrum, from molecular to organismal levels.

The company is led by Professor Jean-Philippe Vert, who brings expertise from his roles at Owkin and Google’s DeepMind, and also includes specialists and alumni from these organizations. 

Key to Bioptimus's strategy is its partnership with Owkin, which provides access to extensive data generation capabilities and multimodal patient data from leading academic hospitals worldwide. For instance, Owkin's initiative, MOSAIC, represents one of the largest multi-omics atlases for cancer research, showcasing the potential of combining computational and experimental research methods.

This collaboration, supported by computing infrastructure from Amazon Web Services (AWS), is crucial for developing AI models capable of capturing the diversity of biological data.

In the realm of academic research, there is, for example, HyenaDNA, a "genomic foundation model" created by Stanford University scientists. It is designed to understand the distribution of DNA sequences, the encoding of genes, and the regulation of gene expression by the sequences located between those coding for amino acids. 

Read a case study, providing further information about AI foundation models, and other companies building foundation models in biotech, such as BioMap, Cyrus Biotechnology, Atomic AI, Terray Therapeutics, and others. 

Next, let’s review three innovative approaches to building AI foundation models for biotech on the advanced infrastructure.

Recursion Unveils Phenom-Beta Model on NVIDIA BioNeMo Platform to Advance Cellular Imaging Analysis

Recursion, in collaboration with NVIDIA, launched the Phenom-Beta model, a foundation model for phenomics, hosted on the NVIDIA BioNeMo platform.

This model represents a next step in the utilization of deep learning for the analysis of cellular images, marking a milestone in a partnership that began last July.

Phenomics, as defined by Recursion, is the comprehensive study of how cells react to various chemical and genetic changes, such as gene knockouts or potential drug treatments.

The Phenom-Beta model is designed to transform cellular microscopy images into general-purpose embeddings, which can then be used to uncover new insights into biology and chemistry. This approach aims to accelerate the discovery of new drugs by enabling a more profound understanding of cellular responses.

The model is capable of processing images on a massive scale, from small-scale projects to analyses involving billions of images. It has been trained to identify subtle changes in cellular morphology—changes that are often imperceptible to the human eye. By creating digital representations of these changes, the model facilitates the exploration of genetic and chemical perturbations in a high-dimensional space, assisting researchers in identifying key mechanistic pathways and potential therapeutic targets.

Phenom-Beta was developed using the RxRx3 dataset, which contains approximately 2.2 million images of HUVEC cells affected by around 17,000 genetic knockouts and 1,674 known chemical entities. Despite being trained on a specific type of imaging assay, the model's application extends to various other assays, showcasing its versatility.

The development of Phenom-Beta underscores the importance of the scaling hypothesis in biological research, demonstrating that increasing the size of training data and model parameters enhances the model's ability to replicate known biological relationships. This was evidenced by a significant improvement in the model's performance, as detailed in a paper presented at a NeurIPS workshop.

Researchers can access the Phenom-Beta model through the NVIDIA BioNeMo platform, where it's available for non-commercial use under specific terms and conditions set by Recursion. The platform offers an easy-to-use cloud API, enabling users to leverage the model at supercomputing scales.

Recursion also hinted at a more advanced model, Phenom-1, which is currently being used by internal teams and select partners.

For those interested in employing the Phenom-Beta model for commercial purposes, Recursion has made provisions for initiating contact to discuss potential uses. 

Evo: DNA foundation modeling from molecular to genome scale

Researchers from the Arc Institute, Stanford, and TogetherAI have unveiled Evo, a biological foundation model that represents a significant advancement in the field of genetic research. 

Evo is designed to operate across the core biological languages - DNA, RNA, and proteins - facilitating both predictive tasks and the generative design of biological sequences ranging from molecules to entire genomes.

Evo distinguishes itself with a massive training scale, employing over 7 billion parameters and achieving a context length of 131k tokens. Its architecture, derived from the StripedHyena design, marks a departure from traditional Transformer models, focusing on improving efficiency and the quality of DNA sequence generation. 

This novel architecture enables the model to work at an unprecedented single-nucleotide resolution, making it adept at handling the extensive lengths of prokaryotic and phage genomic sequences.

The development team has made Evo accessible through GitHub and the Together playground, inviting the scientific community to explore its capabilities firsthand. 

Alongside the model, they are releasing the OpenGenome dataset, a compilation of 2.7 million prokaryotic and phage genomes, heralded as the largest DNA pretraining dataset currently available to the public.

Evo's creation was driven by the quest to understand and model biological systems at a genomic level, a challenge that has remained elusive due to the sheer complexity and length of DNA sequences. The model's ability to integrate long genomic sequences and retain sensitivity to minute changes promises to deepen our understanding of how individual components of DNA, RNA, and proteins interact within complex systems.

In a groundbreaking approach to sequence modeling, Evo leverages the StripedHyena architecture to effectively learn from and generate biological sequences over long ranges, demonstrating superior performance in zero-shot gene essentiality testing, prediction across DNA, RNA, and protein modalities, and the generative design of novel CRISPR systems.

Evo's capabilities extend to the generation of entire genomes, showcasing the potential to produce sequences with thousands of protein-coding sequences, a feat enabled by its long context capabilities and efficient inference mode.

As the first model of its kind to predict and generate DNA sequences at such scale and resolution, Evo's developers emphasize the importance of safe and responsible advancement. They also outline future aspirations to extend Evo's training to human genomes, aiming to unlock further advancements in drug discovery, agriculture, sustainability, and the fight against complex diseases.

This innovation signals a promising shift in the modeling of biological sequences, suggesting that foundation models like Evo could become indispensable tools in scientific research, with far-reaching implications for our understanding of life itself and our ability to engineer it.

New Foundation AI Model Utilizing LLM Stacking and HYFT Technology

The Netherlands-based Nasdaq-traded ImmunoPrecise Antibodies Ltd. (IPA) recently introduced a new AI foundation model through its subsidiary BioStrand. This model integrates Large Language Models (LLMs) with BioStrand's patented HYFT Technology, designed to discern and utilize universal fingerprint patterns found across biological entities. These fingerprints serve as vital connectors, linking sequence data with structural, functional, and bibliographical information across a comprehensive and ever-expanding knowledge graph. 

This graph currently maps 25 billion relationships across 660 million data objects, offering a detailed understanding of the intricate relationships between genes, proteins, and biological pathways. The integration of HYFT technology with LLMs paves the way for decoding the complex language of proteins, crucial for developing antibody drugs and precision medicine.

The concept of "word boundaries" in protein languages is introduced, offering a novel approach to understanding protein structures and functions. This methodology enables the precise identification and analysis of functional units within proteins, fostering advancements in drug discovery, protein-based therapeutics, and synthetic biology. The approach promises to not only accelerate the development of targeted treatments but also to revolutionize protein engineering and design.

Challenges and the Future

Just like with the large language models (LLMs) we hear about, there are worries that the data, used to teach biological AI foundation models, might be biased, depending on where it's coming from. 

This concern comes from Vaneet Aggarwal, a computer science professor at Purdue University, who reminds us that we need to be careful about how we collect our data.

When it comes to creating new molecules with AI, we're just at the starting line. Kyunghyun Cho, a professor of computer science and data science at New York University and senior director of Frontier Research at Prescient Design, which is part of Genentech, points out that there's a lot more work to do after we make these molecules. They have to pass many tests to find out which ones are actually good enough to try out in real experiments. 

What's really interesting, according to Cho, is that while AI that deals with languages is making things we already do faster, with biology, we're stepping into the unknown. This means we have to be really thorough in checking everything because we're exploring something completely new.

Atomic AI Cyrus Biotech Deep Genomics Ginkgo Bioworks OWKIN Recursion Pharmaceuticals Reverie Labs Terray Therapeutics

Topic: AI in Bio

Share:   Share in LinkedIn  Share in Bluesky  Share in Reddit  Share in Hacker News  Share in X  Share in Facebook  Send by email
#advertisement
ThermoFisher Scientific: Integrated genetic technologies for cell therapy development

BiopharmaTrend.com

Where Tech Meets Bio
mail  Newsletter
in  LinkedIn
x  X
rss  RSS Feed

About


  • What we do
  • Citations and Press Coverage
  • Terms of Use
  • Privacy Policy
  • Disclaimer

We Offer


  • Newsletter
  • BioTech Scout
  • Interviews
  • Partner Events
  • Case Studies

Opportunities


  • Advertise
  • Submit Company
  • Write for Us
  • Contact Us

© BPT Analytics LTD 2025
We use cookies to personalise content and to analyse our traffic. You consent to our cookies if you continue to use our website. Read more details in our cookies policy.