Ginkgo Bioworks Launches Open-Source Pharmacology Framework for AI Drug Discovery
Ginkgo Bioworks has introduced the Virtual Cell Pharmacology Initiative (VCPI) through its Ginkgo Datapoints division, creating what it describes as the first open-source pharmacological framework for virtual cell modeling. The effort aims to establish a standardized foundation for AI-based drug discovery by providing free, high-quality pharmacology data generation to researchers worldwide.
VCPI’s objective is to test at least 100,000 compounds and produce more than 12 billion pharmacological data points, forming a publicly accessible dataset optimized for virtual cell research. The project intends to fill a major gap in the field—lack of standardized, reproducible data that can be used to train predictive AI models of cellular behavior.
At its core, VCPI introduces two key elements:
- V-Ref293, an engineered reference cell line designed as a consistent biological standard for virtual cell research, with master cell bank vials to be distributed globally in 2026.
- DRUG-seq, a bulk RNA sequencing workflow optimized for high-throughput pharmacology screening, reportedly capable of processing over 100 x 384-well plates per week.
VCPI could signify a change in how AI-driven biology approaches data generation, moving away from the prevailing “more is better” mindset toward structured, interpretable, and reproducible pharmacology data. Ginkgo directly challenges the reliance on pooled single-cell sequencing for drug discovery, arguing that such designs obscure compound-specific effects and generate noise unsuited for model training. Instead, it promotes DRUG-seq as a scalable bulk RNA method capable of preserving pharmacological clarity at high throughput.
Beyond its technical design, VCPI introduces an unusual governance approach for an industrial data initiative. Participants can influence compound selection and data prioritization from the outset, shaping an open dataset rather than consuming a finished one. Through open participation, any research group or company can submit compounds for testing free of charge, with results released under a Creative Commons license. Participants can choose between immediate open data release, a limited embargo for private use, or indefinite retention.
The platform also includes a community governance layer allowing contributors to vote on compound prioritization, share models, and participate in future benchmarking challenges. Active contributors will receive early data access and engagement privileges.
VCPI expands Ginkgo’s efforts to remove data bottlenecks in machine learning for biology. In 2024, Ginkgo Bioworks introduced its model API developed with Google Cloud to provide affordable, programmatic access to biological AI models trained on proprietary protein and DNA data.
See also: Building the Virtual Cell: AI Foundation Models and Billion-Cell Datasets
The company reports investing more than $1 billion in laboratory automation infrastructure that now underpins its Datapoints division. Initial public data releases are planned for early 2026.
Placed alongside large public bio-AI efforts like those of CZI Biohub and the Arc Institute, VCPI aims to occupy a distinct role by providing standardized, pharmacology-focused training data for virtual cell models.
Topic: AI in Bio