The Emergence of Diffusion Generative Models in Accelerating AI Drug Discovery

Recently, diffusion generative models, such as DALL-E 3, Midjourney or Imagen, have attracted considerable interest for their ability to create captivating and inventive images based on textual prompts. Researchers at MIT CSAIL, UW Institute for Protein Design, DZNE and other institutions are now exploring the potential of these models to extend beyond generating impressive visuals, with the aim of speeding up drug development and reducing adverse side effects.

What are diffusion generative models?

Diffusion models are a generative AI architecture best known for powering modern image synthesis tools. Diffusion models take a gradual generation route: they start from random noise and learn to transform it into a realistic image through a sequence of small refinement steps. This intuition mirrors a physical process—like ink dispersing in water—where structure slowly dissolves into uniform static, and then must be carefully reconstructed.

Training a diffusion model relies on two paired processes. In the forward process, clean training images are progressively corrupted by adding Gaussian noise until they become nearly indistinguishable from pure randomness. In the reverse process, a neural network learns to undo those tiny noising steps, effectively becoming a powerful denoiser. Once trained, the model can generate new images by starting from noise and repeatedly applying its learned denoising procedure, guided by conditions like text prompts. First proposed in 2015 by Sohl-Dickstein and refined into practical form with Ho's denoising diffusion probabilistic model (DDPM) framework in 2020, diffusion models now often run in compressed latent spaces to increase speed and efficiency while preserving quality. And while they are most visible in text-to-image tools, the same architecture can be extended beyond pixels—for instance, to generate protein backbones by diffusing and denoising atomic coordinates.

Companies applying diffusion in biomedicine

Google Deepmind

Google DeepMind is a leading AI research lab, best known for AlphaFold 3, which introduces diffusion models into biomolecular structure prediction.

AlphaFold 3 generalizes structure prediction beyond proteins to full biomolecular complexes, including proteins, DNA, RNA, ligands, ions, glycans, and covalent modifications. It replaces AlphaFold 2’s evoformer and structure modules with a simpler design: a pair-only transformer (“pairformer”) and a diffusion-based module that predicts all-atom coordinates directly. The pairformer encodes inter-residue and inter-entity relationships, while the diffusion model learns to denoise structures across noise scales, capturing both local chemistry and global geometry and generating multiple plausible conformations.

Generate:Biomedicines

Generate: Biomedicines is a Cambridge-based AI drug discovery company.

Its platform is built on Chroma, a diffusion-based generative model that combines a correlated diffusion process grounded in polymer physics with a graph neural network and a dedicated design network. Chroma jointly generates all-atom backbones, side chains, and sequences, and uses a flexible conditioning framework to impose geometric and semantic constraints—such as symmetry, partial scaffolds, predefined 3D shapes, fold classes, and even natural-language prompts. In December 2025, Generate reported GB-0895, an AI-designed monoclonal antibody for asthma, entering Phase 3 of clinical trials. The company also has multi-target partnerships with Amgen and Novartis, with the latter deal potentially exceeding $1B.

Xaira Therapeutics

Xaira is a drug discovery startup co-founded by Nobel laureate David Baker, whose lab developed RFdiffusion. The company launched last year with $1B in funding.

RFAntibody is a framework, based on fine-tuned RFDiffusion, with the training code exclusively licensed by Xaira. It designs antibodies that bind user-defined epitopes. RFAntibody generates new CDR loops (antigen binding regions) and docked binding poses entirely in silico, conditioning on a target structure, a framework template, and an epitope “hotspot” map to drive CDR-mediated contacts at the desired site. CDR sequences are then optimized with ProteinMPNN, and candidates are filtered using a fine-tuned RoseTTAFold2 (both are Baker Lab developments) model for structural self-consistency and interface quality.

Profluent

Profluent is a Berkeley-based protein design company. The company is also developing diffusion-based generative models.

One example is MMDiff, a diffusion model that can generate both protein and nucleic-acid sequences (DNA/RNA) together with their 3D structures, including full complexes like protein–DNA or protein–RNA assemblies. A key feature is that its predictions remain consistent regardless of how a molecule is rotated or positioned in space, which matters because biomolecules have no fixed orientation. By generating sequence and structure jointly, MMDiff can directly propose complete, physically coherent biomolecular complexes.

Diffusion portals

While some companies develop proprietary diffusion generative models for synthesis of proteins and other molecules, others take a shortcut and implement existing open-source tools, serving as diffusion online portals. The most outstanding example is RFDiffusion produced by Baker Lab with the most recent version, RFDiffusion3 being released in early December 2025.

RFdiffusion3 (RFD3) is an all-atom diffusion model for de novo protein design that generates full backbone-and-sidechain structures while modeling interactions with ligands and nucleic acids. It supports atomic-level constraint specification (e.g., hydrogen bonding, solvent exposure, symmetry), enabling precise design of functional systems such as enzymes, DNA-binding proteins, and protein–ligand complexes.

Tamarind Bio

Tamarind is a no-code bioinformatics platform built to make computational tools accessible to life scientists. Since many state-of-the-art machine learning models are difficult to deploy and run, Tamarind provides an intuitive web interface that abstracts away the complexity of HPC infrastructure, software dependencies, and command-line workflows. In addition to models like DiffDock and ProteinMPNN, Tamarind now supports RFDiffusion3.

Dassault Systemes

Dassault Systèmes is a French software company that develops solutions across a wide range of industries, from aerospace to transportation. In biotechnology, it offers BIOVIA Discovery Studio, a computational modeling platform that integrates molecular simulations, structure- and ligand-based drug design, biotherapeutics engineering, macromolecular analysis, and predictive ADMET tools. The platform supports the entire drug discovery pipeline—from target identification to lead optimization—within a unified visualization and modeling environment. Recently, Dassault Systèmes integrated RFdiffusion into BIOVIA to enable advanced protein design workflows.

Levitate Bio

Levitate Bio is a spin-out of Cyrus Biotechnology that provides Rosetta- and AI-driven protein design tools for biotech workflows, integrating methods such as RFdiffusion. Through its Bench (web) and Engine (API/CLI) platforms, Levitate enables scientists to run advanced protocols like Binder Design—an RFdiffusion-based workflow for generating de novo protein binders against a specified target structure.

ProteinIQ

ProteinIQ is a cloud-based platform that offers access to AI models for protein design, structure prediction, molecular docking, and bioinformatics. Its workflow follows a simple upload–configure–run process, running jobs on H100 GPUs and returning results ready for downstream analysis. ProteinIQ supports key models including RFdiffusion (de novo backbone generation and binder design), ProteinMPNN (sequence optimization), Boltz-2 (fast structure prediction), and DiffDock-L (small-molecule docking).

For more information on diffusion models, check out our deep dive on Where Tech Meets Bio

Topic: AI in Bio