Basecamp Research Introduces BaseFold, Enhancing Accuracy in Protein Structure Prediction

by Andrii Buvailo, PhD          News

Disclaimer: All opinions expressed by Contributors are their own and do not represent those of their employers, or
Contributors are fully responsible for assuring they own any required copyright for any content they submit to This website and its owners shall not be liable for neither information and content submitted for publication by Contributors, nor its accuracy.

Topics: Emerging Technologies   
Share:   Share in LinkedIn  Share in Reddit  Share in X  Share in Hacker News  Share in Facebook  Send by email   |  

Basecamp Research announced the development of BaseFold, a new deep learning model designed to predict the 3D structures of large, complex proteins with unprecedented accuracy.

This model represents a significant advancement over existing AI-powered tools, including the widely recognized AlphaFold2. The introduction of BaseFold is expected to accelerate the pace of AI-based drug discovery by offering more reliable predictions for the structures of larger and more complex proteins.

BaseFold's enhanced predictive capabilities stem from its use of BaseGraph, a comprehensive foundational dataset built by Basecamp Research. BaseGraph has been assembled through partnerships with over 25 biodiversity-rich countries, aiming to capture a vast array of genetic information far beyond what current public protein databases offer. These databases, often criticized for their limited size and scope, are believed to represent a minuscule fraction of life on Earth, restricting the effectiveness of AI tools in predicting protein structures that are not well-represented in these datasets.

Visual comparison of the difference in structural prediction performance of AlphaFold2 (orange) against BaseFold (cyan) in the CASP15 and CAMEO competitions. Exemplified here with protein targets T1113 (bacteriophage T7 polymerase inhibitor, left) and 8SSD (methionine synthase, right), BaseFold’s predictions are much closer to the laboratory-validated structures (beige). The white arrows highlight areas where AlphaFold2’s predictions are significantly inaccurate.

By integrating over 6 billion relationships contained in BaseGraph, BaseFold can extract significantly more evolutionary information, enabling it to predict protein structures and small molecule interactions with much greater accuracy.

This model has shown up to a sixfold improvement in accuracy over AlphaFold2 for certain proteins and up to threefold better accuracy in modeling small molecule interactions with protein targets. These enhancements are crucial for developing more advanced therapeutic molecules through AI.

The limitations of current AI models, including AlphaFold2, are partly due to their reliance on public databases like MGnify, which suffers from issues such as incomplete sequences. These issues can degrade the quality of predicted structures, especially for larger proteins. BaseFold aims to overcome these challenges by achieving an accuracy comparable to traditional, time-consuming experimental methods like X-ray crystallography, especially for proteins underrepresented in existing databases.

Basecamp Research's collaboration with NVIDIA to optimize BaseFold for the NVIDIA BioNeMo platform underscores the ongoing efforts to make this tool more accessible and effective for drug discovery. The continuous improvement of BaseFold, driven by the expansion of Basecamp Research’s global network of biodiversity partnerships, highlights the potential for AI to transform our understanding of complex biological systems and accelerate the development of new treatments.

7 Companies Building Foundation Models in Biology and Chemical Synthesis

Dr. Phil Lorenz, CTO of Basecamp Research, emphasized the importance of diverse, representative genomic data for advancing AI in biotechnology. The team's effort to collect and annotate biodiversity data with precision marks a significant step forward in building datasets that are purpose-built for the AI era. Meanwhile, Dr. Glen Gowers, co-founder of Basecamp Research, highlighted the limitations of current AI tools in predicting the structure of large, complex proteins and underscored the critical role of high-quality data in producing accurate AI outcomes.

Topics: Emerging Technologies   

Share:   Share in LinkedIn  Share in Reddit  Share in X  Share in Hacker News  Share in Facebook  Send by email