From Gene Editing to Pathway Design: How AI is Transforming Synthetic Biology

Synthetic biology refers to the purposeful design and construction of new biological components, devices, and systems, as well as the re-engineering of existing natural biological systems to serve practical functions. It extends the principles of genetic engineering, adopting a design-driven approach that incorporates concepts such as standardization, modularity and abstraction. Synbio builds on years of research into how cells and biological systems work, as well as technological advances in areas such as mathematical modeling, DNA sequencing, and DNA synthesis.

Since emerging in the 1960s, synthetic biology has steadily expanded its market footprint. According to SynBioBeta, venture investment in the field indicated growth in 2024, reaching approximately $12.2B year-to-date—up from $10.7B in 2023. Looking ahead, Kathryn Garner from Northumbria University in her 2021 Essays Biochem article argued that by 2030 it’s probable most people on Earth will have encountered synbio‑derived products; however, mass‑market adoption still depends on scale‑up economics, regulation, and consumer acceptance.

Given modern biological research is inherently “data-rich,” artificial intelligence (AI) has become indispensable in advancing synthetic biology. By analyzing vast datasets, AI accelerates the design, construction, and optimization of biological systems far beyond what traditional methods allow. It helps uncover patterns in experiments, predict new biomolecules, and streamline bio-manufacturing processes, enabling faster progress and greater precision.

CRISPR-based gene editing

CRISPR-based tools—nucleases, base editors, and prime editors—are used by scientists to modify DNA in slightly different ways: nucleases cut, base editors swap single nucleotides, and prime editors rewrite small genomic sections. However all of them share a common challenge of unpredictable efficiency and off-target effects.

Machine learning and deep learning models can now predict which guide RNAs (molecule guiding Cas enzyme) will be most effective, flag off-target risks, and even forecast what kinds of edits each system will produce. For example, models like DeepSpCas9 predict nuclease activity, BE-Hive and BE-DICT help optimize base editing, and DeepPE guides pegRNA choices for prime editing.

AI also plays a growing role in discovering and designing new CRISPR systems. Structure-prediction and generative AI tools, such as protein language models, can now engineer new Cas variants—smaller, more precise, or tailored for specific applications. In a recent example, Profluent Bio reported OpenCRISPR-1, the first AI-designed CRISPR gene editor to alter intended targets in human cells successfully. OpenCRISPR-1 protein matched standard Cas9’s efficiency but produced 95% fewer off-target edits and showed signs of lower immunogenicity.

Regeneron teamed up with Mammoth Biosciences, the CRISPR startup co-founded by Nobel laureate Jennifer Doudna, in April 2024 to develop in vivo gene-editing therapies. The partnership combines Regeneron’s AAV delivery platforms with Mammoth’s ultracompact nucleases, in a deal worth $100M upfront (mostly equity) and up to $370M per target in milestones. It is Regeneron’s second major CRISPR alliance following its long-running collaboration with Intellia Therapeutics on in vivo Cas9-based treatments. But Doudna’s involvement reaches further: AstraZeneca has struck a deal with Algen Biotechnologies, a spin-out from Doudna’s lab, granting exclusive rights to develop therapies from targets uncovered via Algen’s AI-CRISPR “AlgenBrain” platform. The October agreement is valued up to $555M in milestones. Under the deal, AstraZeneca gets exclusive development and commercialization rights for targets flagged through the partnership.

Eli Lilly joined the CRISPR race announcing in June the acquisition of Verve Therapeutics, a CRISPR-based biotech developing gene-editing treatments for cardiovascular diseases, in a deal worth $1B upfront plus $300M in milestones. Verve’s lead program uses in vivo base editing to target the PCSK9 gene and showed promising phase 1 results.

As for academic pursuits, in July 2025, Nature Biomedical Engineering published a paper introducing CRISPR-GPT, a large language model–based multi-agent system designed to guide researchers in planning and carrying out CRISPR experiments. Developed by teams from Stanford, Princeton, Google DeepMind, and UC Berkeley, the system promises to lower the expertise barrier for CRISPR-based gene editing.

Gene sequence optimization

In synthetic biology, gene codon optimization is a critical step for maximizing protein yield when expressing genes in non-native hosts. Traditionally, this means swapping rare codons for ones more frequently used by the host organism to boost translation efficiency. But this frequency-based approach can sometimes disrupt translation timing and protein folding, leading to aggregation or loss of function.

To address these issues, there are several quantitative metrics that guide optimization. One widely used example is the Codon Adaptation Index (CAI), which measures how closely a gene’s codon usage matches that of highly expressed genes in the host. A higher CAI generally indicates better potential for expression. Other metrics (like %MinMax, relative usage frequency of synonymous codons) further refine codon choice by considering tRNA availability and mRNA stability. Still, most of these heuristic methods overlook the complex nature of codon usage, suggesting that AI-driven models could better capture the subtleties that determine expression and stability.

One such recent model is CodonTransformer. It is an AI tool released in April 2025 by researchers from Vector Institute for Artificial Intelligence, University of Toronto and Sorbonne University. The model utilizes Transformer neural networks (same as behind ChatGPT) to optimize DNA sequences for protein expression across species. Trained on over a million gene-protein pairs from 164 organisms, it learns how different species prefer certain codons, then generates DNA that matches those preferences while avoiding regulatory “trouble spots.” CodonTransformer is context-aware in gene design and is open-source.

Applications of codon optimization span a wide range of biotechnologies, including the production of recombinant protein drugs as well as nucleic acid–based therapies such as gene therapy, mRNA therapeutics, and DNA/RNA vaccines.

Ginkgo Bioworks’ mDD-0 model, released in February 2025, is designed to generate complete mRNA sequences rather than modifying existing ones. It uses a discrete diffusion approach to produce transcripts that include the 5′ and 3′ UTRs as well as the coding region, conditioned on both the amino acid sequence and the host species. The model captures species-specific codon preferences and features of UTRs such as GC content and Kozak motifs. According to Ginkgo’s white paper, mDD-0 outperforms genetic algorithms on internal benchmarks for in-silico stability and translation metrics, though external validation has not yet been reported.

Protein design

Applying AI in protein design means using machine learning and deep learning models to predict, generate, and optimize protein sequences and structures by learning complex relationships between amino acid sequences, 3D conformations, and functions. Modern approaches use protein language models, structure predictors like AlphaFold3, and generative algorithms such as Variational AutoEncoders or diffusion models (e.g. RFdiffusion) to create novel proteins with desired stability, binding, or catalytic properties. These computational predictions are then validated and refined through experimental assays, enabling faster, more targeted discovery compared to traditional trial-and-error or physics-based methods.

Latent Labs, a DeepMind spinout, launched in February 2025 from stealth with $50M. The company is integrating generative AI for de novo protein design with in-house wet-lab validation of modelled compounds. The same month Capgemini announced a new generative AI protein engineering method. It uses a protein large language model (pLLM) to predict the best protein variants while reportedly reducing data requirements by over 99%, to R&D faster and cheaper even with limited experimental data. Developed by Cambridge Consultants, Capgemini’s biotech lab, the approach already boosted a plastic-degrading enzyme’s efficiency by 60% and created a sevenfold brighter version of Green Fluorescent Protein using only 43 experiments instead of thousands.

Another molecular creator, Nabla Bio, (spinout from George Church lab), is advancing generative protein design by integrating AI foundation models with high-throughput wet-lab validation. The company recently expanded its collaboration with Takeda into a multiyear partnership worth over $1B in potential milestones. Nabla applies its Joint Atomic Model (JAM) platform to design highly selective antibodies and protein therapeutics against some of the hardest-to-drug multipass membrane proteins, such as GPCRs, ion channels, and transporters.

The established pharmas have had their own protein design initiatives in 2025. Researchers from AstraZeneca, University of Sheffield and the University of Southampton have developed in June an AI framework MapDiff that could make protein design for new medicines faster and more accurate. Published in Nature Machine Intelligence, the system tackles the challenge of inverse protein folding—figuring out which amino acid sequences will fold into a desired 3D structure, a key step in creating novel therapeutic proteins. Novartis partnered with Generate:Biomedicines in a collaboration worth over $1B (including $65M upfront) to co-develop protein therapeutics using Generate’s generative AI platform. The technology designs novel proteins, including de novo binders for hard-to-drug targets, and has reportedly shown superior hit rates across multiple validated targets.

Additionally, US National Science Foundation (NSF) selected Arzeda as lead on a NSF’s consortium (~$6.3M) to advance cell‑free biomanufacturing using AI‑designed enzymes in September 2025. The project, as a part of Use Inspired Acceleration of Protein Design (USPRD) builds on Arzeda’s successful application of their proprietary combination of physics-based protein design and AI algorithms in production of ViaLeaf Reb M natural sweetener.

Metabolic pathway engineering

AI is transforming metabolic pathway engineering from a slow, trial-and-error process into a programmable, precision discipline. Companies like TeselaGen Biotechnology are leading this shift by offering AI-powered platforms that design and optimize enzymes, pathways, and microbial strains for industrial bioproduction.

In Asia, BioGeometry partnered with Harworld Biology in May 2025 to build AI-driven cell factories, explicitly applying machine learning to every stage of the metabolic engineering stack—from enzyme mining to pathway optimization and host strain design. Meanwhile, companies like Cradle Bio and Evogene are deploying generative AI and multi-omics platforms to improve enzyme activity and host performance, two foundational steps in engineering viable metabolic pathways.

Academic research is keeping pace: a 2025 Nature Communications paper demonstrated an autonomous AI–biofoundry platform that engineered enzymes with up to 90‑fold improvements in substrate preference, directly showcasing how machine learning accelerates the core building blocks of metabolic pathway optimization, while a review in Energy Advances highlighted AI‑driven strain optimization as a key enabler for next‑generation biofuels and sustainable biomanufacturing.

Automated experiment design

In synthetic biology, AI-driven experimental design involves using machine learning and optimization techniques to systematically plan and adjust experiments, aiming to speed up the design-build-test-learn cycle. By analyzing multi-omics and prior experimental data, AI models can identify knowledge gaps, propose informative conditions, and optimize variables such as gene circuits, metabolic pathways, or host strains. Approaches like active learning and Bayesian optimization help prioritize experiments that maximize insight while minimizing cost and time, enabling more efficient and predictive engineering of biological systems.

Benchling and Moderna continued their collaboration in May 2025 to create an AI-ready R&D platform that unifies Moderna’s research and technical development groups under a single digital ecosystem. The partnership is designed to streamline experimental design, sample tracking, and results analysis within a single system, replacing fragmented spreadsheets, electronic lab notebooks, and specialized lab tools.

Lila Sciences has raised $115M to build what it calls “AI Science Factories”—automated labs run by AI models that design and execute experiments around the clock. This funding round elevated the total valuation to $1.3B. The company recently leased a 235,500-square-foot facility in Cambridge, MA, one of the largest new lab spaces in the area this year. Additionally, Lila also plans to offer commercial access to its AI models and automated labs through enterprise software. Reportedly, firms in energy, semiconductors, and drug development have already shown interest in the platform, though no specific companies were named.

To wrap things up, AI has the power to bring about significant changes in synthetic biology, helping us tackle some of today's most urgent problems. However, it's important to remember that this collaboration between disciplines comes with its own set of challenges. The complicated nature of biological systems, the limitations of the data we have, the struggle to validate models, and the need for cooperation between various fields all present hurdles to overcome.

Topic: AI in Bio

ATUM Arzeda DeepMind Ginkgo Bioworks Inscripta Synthace Synthego TeselaGen Twist Bioscience Zymergen