17 Companies Pioneering AI Foundation Models in Pharma and Biotech

Foundation models represent a relatively new paradigm in artificial intelligence (AI), revolutionizing how machine learning models are developed and deployed.

Foundation models are a class of large-scale machine learning models, typically based on deep learning architectures such as transformers, that are trained on massive datasets encompassing diverse types of data. The most prominent examples of general-purpose foundation models are the GPT-3 and GPT-4 models, which form the basis of ChatGPT, and BERT, or Bidirectional Encoder Representations from Transformers. These are gigantic models trained on enormous volumes of data, often in a self-supervised or unsupervised manner (without the need for labeled data).

Their scalability in terms of both model size and data volume enables them to capture intricate patterns and dependencies within the data. The pre-training phase of foundation models imparts them with a broad knowledge base, making them highly efficient in few-shot or zero-shot learning scenarios where minimal labeled data is available for specific tasks.

This approach demonstrates their high versatility and transfer learning capabilities, adapting to the nuances of particular challenges through additional training.

Below, I list a number of companies building domain-specific foundation models for drug discovery, biotech, and even healthcare applications.

#advertisement

Pharma Complience Management: A UK/EU Guide

Atomic AI

Atomic AI, a biotech company focused on AI-driven RNA drug discovery, aims for atomic precision in its work. As of 2025, the company is reported to have raised ~$42M across seed and Series A rounds. Their proprietary platform, PARSE (Platform for AI-driven RNA Structure Exploration), is based on a machine-learning model trained on a limited set of RNA molecules.

In 2025 materials, Atomic AI also highlights ATOM-1™, a large-language-model component that uses in-house chemical mapping data to predict RNA structural and functional properties and assist optimization of RNA-targeted and RNA-based modalities. This model makes accurate predictions about the structure of various RNA molecules, enhancing RNA structure prediction. Atomic AI utilizes its foundational model internally for its drug discovery program, enabling it to pursue novel targets in RNA that were arguably previously inaccessible, with the company describing a combined foundation-model + wet-lab loop on its updated technology page.

BioMap

BioMap focuses on unveiling nature’s rules and generating diverse proteins with high accuracy. Their primary foundation model, xTrimo (Cross-Modal Transformer Representation of Interactome and Multi-Omics), is designed to understand and predict biological behavior across multiple modalities.

In 2025, BioMap’s website states that xTrimo now has ~210 billion parameters and supports DNA, RNA, protein, cellular, and systems-level modalities, positioning it as the world’s largest life science foundation model. This model’s scale arguably allows it to inform multiple downstream task models even with minimal labeled data.

BioMap’s strategic collaboration with Sanofi, first announced in 2023, involves co-developing AI modules for biotherapeutic drug discovery; the agreement included a $10 million upfront payment with potential for over $1 billion in milestones. In 2025, BioMap also highlighted GPU-accelerated deployment of xTrimo using multi-expert architectures and FP8 precision to lower computational cost and latency for large-scale biological workloads.

Bioptimus

Bioptimus is a French biotech/AI startup that, in February 2024, raised a $35 million seed round, led by Sofinnova Partners and others, to develop an AI foundation model spanning molecular to organismal biology. In 2025, the company announced an additional $41 million in funding, bringing total capital raised to about $76 million.

Their first released model, H-Optimus-0, is a foundation model for pathology that reportedly surpasses existing approaches in benchmarks such as gene expression prediction from morphology and cancer subtyping.

To support large-scale training, Bioptimus gained access in 2025 to NVIDIA DGX Cloud / Lepton infrastructure. Recently, the company also expanded its leadership team, appointing Ketan Patel as Head of Product and Tokuwa Kanno as Head of Computational Biology to accelerate translational applications. Most recently, in September 2025 Bioptimus announced a Scientific Advisory Board composed of experts in AI, genomics, and computational biology to guide the development of next-generation biological foundation models, as per the company’s website.

Chai Discovery

Chai Discovery, a San Francisco–based AI biology startup, initially raised nearly $30 million in seed funding led by Thrive Capital and OpenAI, valuing the company at ~$150 million.

It made headlines after releasing Chai-1, an open-source multi-modal foundation model for molecular structure prediction, capable of unifying predictions across proteins, small molecules, DNA, RNA, and covalent modifications.

In benchmark tests, Chai-1 achieved a 77% success rate on the PoseBusters test (versus 76% for AlphaFold3) and a Cα LDDT of 0.849 on the CASP15 monomer set (versus 0.801 for ESM3-98B). Crucially, Chai-1 can operate in single-sequence mode without depending on multiple sequence alignments (MSAs), which reduces data and compute demands.

In 2025, Chai also launched Chai-2, a generative model for de novo antibody design achieving a ~16% hit rate across 52 novel antigen targets, vastly improving throughput and reducing dependency on large screening campaigns.

Most recently, the company closed a $70 million Series A round led by Menlo Ventures, bringing its total funding to $100 million and further validating investor confidence in its AI-driven molecular discovery pipeline.

Cyrus Biotechnology

Cyrus Biotechnology, headquartered in Seattle, integrates advanced AI and computational methods into protein biochemistry and biologics development. Their platform combines Rosetta (the atomistic/statistical modeling suite from UW’s Institute for Protein Design) with machine learning–driven modules for tasks like solubility, stability, post-translational modifications, and immunogenicity predictions.

In 2022, Cyrus co-founded the OpenFold consortium (with Amazon, NVIDIA, Genentech, Outpace, Arzeda, and academic partners) to build an open, trainable foundation model for proteins, with Cyrus CEO Lucas Nivon serving on the executive committee.

In 2025, Cyrus announced CYR212 as its clinical candidate for chronic IgG-driven autoimmune diseases, aiming to bring forward a next-generation IgG-degrading enzyme optimized for half-life, efficacy, and low immunogenicity. The company also spun out Levitate Bio in mid-2024 to commercialize its software and services stack—Levitate is owned by the Rosetta Commons Foundation and serves biopharma clients via APIs, GUIs, and custom AI design tools.

Deep Genomics

In September 2023, Deep Genomics unveiled BigRNA, a pioneering AI foundation model for uncovering RNA biology and therapeutics. It is the first transformer neural network engineered specifically for transcriptomics. BigRNA is informed by nearly two billion adjustable parameters and trained on thousands of datasets, totaling over a trillion genomic signals.

This model is designed to predict tissue-specific regulatory mechanisms of RNA expression, binding sites of proteins and microRNAs, and the effects of genetic variants and therapeutic candidates. By understanding these complex RNA interactions, BigRNA facilitates the discovery of new biological mechanisms and RNA therapeutic candidates that traditional approaches might miss, exemplifying its transformative potential in RNA-based drug discovery.

Enveda Biosciences

In May 2024, Enveda Biosciences unveiled PRISM (Pretrained Representations Informed by Spectral Masking), a foundation model trained on 1.2 billion small molecule mass spectra, aiming to enhance molecular structure identification. PRISM employs self-supervised learning on a large dataset of unannotated spectra, using a masked peak modeling approach similar to masked language modeling in NLP.

The model improves the prediction of chemical properties and spectral matching tasks, aiding drug hunters in identifying new medicines from natural molecules. Enveda plans to expand the PRISM dataset to further enhance the model’s predictive capabilities, supporting the discovery of novel therapeutics.

In 2025, the company raised $150 million in a Series C round backed by Sanofi, followed later in the year by an additional $150 million Series D, reaching unicorn valuation.

Ginkgo Bioworks

In August 2023, Ginkgo Bioworks and Google Cloud announced a 5-year partnership aimed at developing state-of-the-art large language models (LLMs) focused on genomics, protein function, and synthetic biology. Ginkgo’s AI foundation model will run on Google Cloud's Vertex AI platform, aiming to accelerate innovation in drug discovery, agriculture, industrial manufacturing, and biosecurity.

Furthermore, in February 2024, Ginkgo committed to building next-generation biological foundation models by acquiring key assets of Reverie Labs, a startup specializing in AI/ML tools for drug discovery. This acquisition includes Reverie's infrastructure and software for training large-scale AI models, enhancing Ginkgo's capabilities in developing comprehensive biological models.

In 2025, Ginkgo launched the Ginkgo Technology Network, an ecosystem of more than 25 partners to integrate AI, biologics, and manufacturing tools for customers.

Helical

Helical, founded in 2023 and based in Luxembourg, raised €2.2 million in seed funding in June 2024 to build the first open-source platform dedicated to bio foundation models for DNA and RNA data. Led by co-founders Rick Schneider, Mathieu Klop, and Maxime Allard, the company aims to democratize access to advanced AI tools for pharmaceutical and biotech companies, helping them integrate these models into drug discovery processes without the need for specialized AI teams.

Helical focuses on creating a user-friendly interface for biologists and data scientists, allowing them to leverage complex genomic models through simple API calls. The platform includes a library of Bio AI Agents—pre-built applications tailored for tasks such as biomarker discovery and target prediction.

Unlike other platforms, Helical specifically integrates DNA and RNA foundation models, offering researchers the ability to work directly with nucleotide data rather than general AI inputs like text or images.

In 2025, Helical launched its platform in the AWS Marketplace under the AI Agents & Tools category, simplifying access for pharmaceutical clients. The same year, it released Helix-mRNA, a hybrid foundation model for mRNA therapeutics that outperforms prior methods in modeling UTRs and long-sequence regions, while using only ~10% of the parameters of comparable models (arXiv: Helix-mRNA paper).

MindWalk Holdings Corp (formerly, ImmunoPrecise Antibodies)

MindWalk, based in the Netherlands, built an AI foundation model that integrates Large Language Models (LLMs) with its patented HYFT® technology. HYFT enables the detection of universal fingerprint patterns across biological entities, linking sequence, structural, functional, and bibliographical data within a comprehensive knowledge graph. This integration allows the decoding of complex protein “language,” supporting applications in drug discovery, protein therapeutics, and synthetic biology.

In 2025, MindWalk advanced its LensAI™ platform by identifying a dual-pathway regimen involving GLP-1 receptor agonists, uncovering a previously unknown complementary longevity mechanism. This work demonstrates how BioStrand’s explainable, fingerprint-driven AI framework can move beyond descriptive biology.

InSilico Medicine

InSilico Medicine, a pioneer in generative AI for drug discovery, founded in 2014, recently launched Precious3GPT — a multi-omics, cross-species foundation transformer model designed for aging research and therapeutic prediction.

The model ingests data from rats, monkeys, and humans across modalities such as transcriptomics, proteomics, methylation, and clinical assays, enabling virtual experiments to forecast compound effects on aging hallmarks in diverse tissues. Precious3GPT is accessible via public repositories (e.g. on Hugging Face) and is built on a novel tokenization scheme processing over 2 million data points spanning omics, text, and knowledge-graph inputs. It supports tasks such as tissue-specific biological age prediction, cross-species translation of drug effects, and synthetic omics generation via prompt-based queries.

In 2025, InSilico secured a $110 million Series E financing to expand its AI platform and drug pipeline efforts. The same year, its candidate Rentosertib (ISM001-055), discovered via its AI stack, achieved proof-of-concept clinical results in idiopathic pulmonary fibrosis, published in Nature Medicine.

Additionally, the Precious3GPT framework was extended to modeling fibrotic lung aging under IPF via a variant dubbed IPF-Precious3GPT, used alongside a pathway-aware proteomic aging clock to uncover molecular overlaps between lung fibrosis and accelerated aging (EurekAlert, August 2025).

Noetik

Noetik, founded in 2022 by Jacob Rinaldi and Ron Alfa in the San Francisco Bay Area, is using AI to tackle one of the toughest challenges in cancer treatment: finding the right targets and understanding how drugs will work on different patients. In 2024, they secured $40 million in Series A funding to expand their work, including growing one of the world’s largest cancer biology datasets and enhancing their in vivo CRISPR Perturb-Map platform, which helps them test and refine potential therapies.

At the heart of Noetik’s tech is OCTO, a powerful AI model that acts like a virtual lab for cancer research. While many AI models focus on predicting molecular structures, OCTO goes further—it predicts how different cancer treatments might play out in real patients. This model can be thought of as a simulator that can test “what if” scenarios, helping scientists see which treatments could work best for which patients, without the long trial-and-error of traditional methods.

OCTO is trained on a large mix of data from thousands of tumor samples, including gene expression, protein data, and images of the cancer cells. By learning from these varied inputs, OCTO is said to be able to predict how tweaking a single gene could change protein levels across a tumor.

Paige

Paige, now acquired by Tempus AI for $81.25 million (primarily in stock), is being integrated into Tempus’s oncology AI stack. Their foundation model, PRISM (a Multi-Modal Generative Foundation Model for Slide-Level Histopathology), enhances capabilities in cancer detection, biomarker identification, cellular subtyping, spatial biology, and therapy response prediction using Virchow tile embeddings aggregated into slide-level embeddings.

With Tempus’s acquisition, Paige’s dataset of nearly 7 million digitized pathology slides and associated clinical data is now being leveraged to accelerate the development of Tempus’s oncology foundation models. In 2025, Paige (under Tempus) also released PRISM2, a next-generation slide-level model trained with clinical dialogue that is said to outperform previous pathology models in diagnosis and biomarker prediction tasks.

Piramidal

Piramidal, founded in 2024 by Dimitris Sakellariou and Kris Pahuja, is developing a foundational AI model specifically for analyzing brainwave data from electroencephalography (EEG). EEGs are widely used in hospitals to monitor brain activity, but interpreting these signals can be complex and varies between different machines and setups. Piramidal aims to simplify and standardize this process with a model that can consistently detect critical brainwave patterns, regardless of the equipment used or the specific patient characteristics. This could improve monitoring for conditions like seizures or strokes, especially in high-stakes environments like neural ICUs, where continuous observation is crucial but often stretched thin by staff limitations.

Piramidal’s approach is to treat EEG signals like a language of the brain, similar to how large language models like ChatGPT handle human text. By training on a massive collection of EEG data from various sources, the model learns to recognize patterns in brain activity that might indicate medical concerns. This is akin to a highly trained assistant that can spot subtle signs of brain distress, which might otherwise go unnoticed by even experienced medical professionals, especially when the data comes from different EEG machines with varying configurations. This uniform approach not only saves time but also reduces the chance of human error in diagnosing serious conditions.

The company's model is designed to work out-of-the-box with any EEG setup, unlike traditional models that require retraining for each machine or scenario. It acts like a universal translator for brainwaves, simplifying implementation across different hospital settings without customization. By using diverse, harmonized data from various open-source sources, the model starts with a strong foundation, making it more effective from the outset compared to models that begin from scratch.

In 2025, Piramidal entered into a collaboration with the Cleveland Clinic to deploy its brainwave foundation model in ICUs, utilizing nearly a million hours of EEG recordings to detect neurological events in near real-time. Under that program, the model is being tested prospectively on live patient EEG streams to flag anomalies like seizures or reduced brain function.

Piramidal raised $6 million in seed funding from investors like Adverb Ventures, Lionheart Ventures, and Y Combinator to support computing costs and team expansion.

Recursion

Recursion, a biopharma company merging AI with massive biological datasets, built Phenom-Beta as one of its first foundational models on NVIDIA’s BioNeMo platform. Phenom-Beta processes cellular microscopy images into general-purpose embeddings using a vision transformer (ViT) architecture and self-supervised learning (masked autoencoders) to reconstruct ~75% masked pixels, enabling feature learning without labeled data.

It was trained on the RxRx3 dataset (~2.2 million HUVEC images across ~17,000 knockouts and 1,674 chemicals) yet generalizes to other imaging modalities like brightfield. As part of its capabilities, it performs in-silico fluorescent staining from brightfield images, predicting high-contrast images from lower-fidelity inputs.

Users can access Phenom-Beta via a cloud API on BioNeMo, democratizing access to phenomics embeddings. In 2025, Recursion launched a partnership with MIT to co-develop an open-source protein co-folding model that complements Phenom’s phenotypic embeddings in mapping cellular pathways (Recursion press). Additionally, Recursion’s internal compute capacity was boosted by the deployment of its BioHive-2 supercomputer, boasting 504 NVIDIA H100 GPUs, enabling faster training and scaling of foundation models.

Retro Biosciences

Retro Biosciences is a San Francisco–based biotechnology company focused on interventions to slow or reverse cellular aging, with a particular emphasis on neurodegeneration. Backed by $1 billion in funding led by Sam Altman, the company combines wet-lab biology with computational methods to test reprogramming-based therapeutics.

In collaboration with OpenAI, Retro has employed GPT-4b micro, a biology-specialized foundation model trained on protein sequences, biological literature, and tokenized 3D structural data, to redesign transcription factors derived from the Yamanaka set.

This effort produced RetroSOX and RetroKLF, modified variants of Sox2 and Klf4 that demonstrated more than 50-fold increases in pluripotency marker expression and improved DNA repair during reprogramming. Early studies showed accelerated induction of pluripotency markers such as NANOG and TRA-1-60, enhanced iPSC colony formation, and reduced DNA damage signaling in reprogrammed fibroblasts and mesenchymal stromal cells from older donors.

Currently, the company is planning an Alzheimer’s disease trial intended to evaluate whether partial cellular reprogramming can restore cognitive function by reversing age-associated molecular changes in the brain.

Terray Therapeutics

Terray Therapeutics integrates large-scale experimentation with generative AI to accelerate small molecule drug discovery. Their platform combines ultra-high throughput chemistry, generative AI, biology, medicinal chemistry, automation, and nanotechnology. In November 2023, Terray announced a collaboration with NVIDIA to leverage DGX Cloud for training chemistry foundation models. In 2024, Terray closed an oversubscribed $120 million Series B to expand its generative AI pipeline and enhance its tNova platform. That same year, they entered a multi-target collaboration with Gilead Sciences to apply tNova toward the discovery of novel small molecule therapeutics. In 2025, Terray strengthened its leadership by appointing Elliott Levy, MD, to its Scientific Advisory Board and adding Wendy Young, Ph.D., to its board of directors.

This list is a brief overview of what is happening in the space of foundation models in biotech and drug discovery, but the picture wouldn’t be complete without looking at some of the notable academic projects and most recent arrivals, the industry context, Nobel Prize, startups and the emerging virtual cell.

Topic: AI in Bio

Atomic AI Cyrus Biotech Deep Genomics Enveda Biosciences Ginkgo Bioworks ImmunoPrecise Antibodies Paige.AI Recursion Pharmaceuticals Terray Therapeutics