Basecamp Research Introduces ZymCTRL: An Open-Source AI Tool for Enzyme Design

by Roman Kasianov       News

Disclaimer: All opinions expressed by Contributors are their own and do not represent those of their employers, or
Contributors are fully responsible for assuring they own any required copyright for any content they submit to This website and its owners shall not be liable for neither information and content submitted for publication by Contributors, nor its accuracy.

Share:   Share in LinkedIn  Share in Reddit  Share in X  Share in Hacker News  Share in Facebook  Send by email   |  

Basecamp Research, in collaboration with the Ferruz Laboratory at the Institute of Molecular Biology of Barcelona, has introduced ZymCTRL, an open-source tool utilizing generative AI to create enzyme sequences based on simple text inputs.

AI-Driven Enzyme Design

ZymCTRL is the first open-source, text-based enzyme generation model, with applications in various sectors including therapeutics and sustainability initiatives. Unlike traditional large language models (LLMs) that require a known protein starter sequence, ZymCTRL operates without needing a seed sequence. This feature allows users to produce enzyme sequences with only 30% resemblance to existing sequences, thereby expanding the potential for innovative enzyme design.

Functionality and Industry Application

The tool is capable of generating functional enzyme sequences for diverse industrial applications. Dr. Noelia Ferruz from the Ferruz Laboratory compared the user-friendly nature of ZymCTRL to interacting with a chatbot. The Ferruz lab, known for its work on ProtGPT2, considers ZymCTRL a significant advancement in AI-driven protein design.

Dr. Philipp Lorenz, CTO of Basecamp Research, highlighted the broad applications of ZymCTRL, from disease treatment to sustainable industrial processes:

"Even before the release of ChatGPT,  we began working on large language models with Noelia because we think these models represent the future of biological research and protein design... We’re deeply excited by these results and ZymCTRL’s ability to create functional enzymes that can solve some of today’s biggest challenges, from finding new ways to treat devastating diseases to building greener and more sustainable catalytic processes in bioindustry"

The model has undergone reviews from experts at institutions such as Austria's Graz University of Technology, who noted its efficiency and ease of use on consumer GPUs.

Technical Highlights

ZymCTRL was initially trained on the BRENDA enzyme database, which includes 37 million enzyme sequences. The model generated carbonic anhydrases and lactate dehydrogenases without additional fine-tuning, demonstrating functional enzyme activity with less than 40% sequence similarity to known proteins. Further adjustments using Basecamp Research’s proprietary BaseGraph dataset improved the model’s performance, particularly in creating high-quality lactate dehydrogenase sequences.

Basecamp Research field scientists collect samples during an expedition to Costa Rica on July 29, 2023. Basecamp Research is building the largest and most diverse foundational database, purpose-built for artificial intelligence models. Lab tests confirm Basecamp Research's more diverse database supercharges AI models like ZymCTRL to generate sequences that are richer and more robust for industrial use. (Photo courtesy of Basecamp Research)

These enzymes exhibited stability and activity across a wide pH range and high temperatures, indicating their versatility for various industrial applications. Some of these artificial enzymes maintained their functionality after freeze-drying and under complex reaction conditions, highlighting their industrial potential.

Future Prospects

Dr. Glen Gowers, co-founder of Basecamp Research, noted the broader implications of ZymCTRL and the potential of generative AI in biotech:

"Beyond the obvious excitement of being able to generate truly de novo proteins, the results are a further testament to the ability of Basecamp Research’s dataset to produce better results compared to publicly available datasets, which barely scratch the surface of the Earth’s immense biodiversity... Earlier we were able to show that our BaseFold model, also powered by our dataset, outperformed AlphaFold2 in predicting protein structures. Generative AI is going to have a huge impact across biotech, and we’re dedicated to collecting the data and tools needed to make its potential a reality."

Basecamp Research plans to continue enhancing ZymCTRL and exploring its applications in fields such as biofuel production, sustainable agriculture, and disease diagnostics.

  • For more detailed information, the full preprint is available at bioRxiv.
  • Basecamp Research invites the research community to try ZymCTRL, now available for public use on Hugging Face:
Share:   Share in LinkedIn  Share in Reddit  Share in X  Share in Hacker News  Share in Facebook  Send by email