If you are a biologist or a drug hunter and you haven’t read a blog by Dr. Vijay Pande, a general partner at Andreessen Horowitz (a16z) and founding investor of a16z’s Bio Fund -- you should do it.
Apart from being one of the most influential venture capitalists in the biotech space, having companies like Insitro and BioAge in the fund’s portfolio, Dr. Pande is regularly sharing his vision of how technologies are changing drug discovery and biotech research. According to him, we are in the middle of an industrial revolution happening in drug discovery and biotech, the driving force of this change being artificial intelligence (AI), machine learning, and automation.
Hard to disagree with such a vision.
Indeed, there is a growing wave of companies building drug design platforms of new generation -- Recursion Pharmaceuticals (NASDAQ: RXRX), Insitro, Exscientia (NASDAQ: EXAI) , Insilico Medicine, Deep Genomics, Valo Health, Relay Therapeutics (NASDAQ: RLAY), you name it -- companies that create highly integrated and automated AI-driven and data-centric drug design processes from biology modeling and target discovery, all the way to lead generation and optimization (sometimes referred to as “end-to-end” platforms). These “digital biotechs” are trying to transform traditional drug discovery, a notoriously bespoke, artisan process, into a more streamlined, repeatable, data-driven process -- more resembling an industrial conveyor line for drug candidates. Announcements by Exscientia (NASDAQ: EXAI) (here), Deep Genomics (here), Insilico Medicine (here), and other companies point to a situation where the average time for an entire preclinical program -- from building disease hypothesis to official nomination of a preclinical drug candidate -- have shrunk down to timelines as short as 11-18 months, and at fraction of costs of a typical project of similar nature conducted “traditionally”. Rapid timelines are achieved in drug repurposing programs with previously known drugs or drug candidates, for example, using AI-generated knowledge graphs, e.g. BenevolentAI (AMS: BAI) in their Baricitinib program, or advanced multiomics analysis and network biology to derive precision biomarkers for better patient stratification and matching novel indications -- as Lantern Pharm (NASDAQ: LTRN) does to rapidly expand their clinical pipeline.
However, a lot of those AI-driven “digital biotechs” are still relying on community-generated data to train machine learning models, and this may come as a limiting factor. While some of the leading players in the new wave, such as Recursion Pharmaceuticals and Insitro, are investing heavily into their own high-throughput lab facilities to get unique biology data at scale, other companies appear to be more focused on algorithms and building AI systems using data from elsewhere, and only having limited in-house capabilities to run experiments.
A common practice is to use community-generated, publicly available data. But it comes with a caveat: an overwhelming majority of published data may be biased or even poorly reproducible. It also lacks standardization -- conditions of the experimentation may differ, leading to a substantial variation in data obtained by different research labs or companies. A lot has been written about it, and a decent summary of the topic was published in Nature: “The reproducibility crisis in the age of digital medicine”. For instance, one company reported that their in-house target validation effort stumbled at their inability to reproduce published data in several research fields. The obtained in-house results were consistent with published results for only 20-25% of 67 target validation projects that were analyzed, according to the company’s report. There are numerous other reports citing poor reproducibility of experimental biomedical data.
This brings us to a known bottleneck of “industrializing drug discovery”: the necessity for large amounts of high quality data, highly contextualized, properly annotated biological data that would be representative of the underlying biological processes and properties of cells and tissues.
In order for a wide-scale industrialization of drug discovery to occur, the crucial thing is the emergence of widely adopted global industrial standards for data generation and validation -- and the emergence of the ecosystem of organizations which would be “producing” vast amounts of novel data following such standards. Then, large drug makers and smaller companies would be able to adopt AI technologies to a much deeper extent. If we take the automotive industry as an example, a component of, say, an engine, developed in one part of the world would often fit into a technological process line in the other part of the world. So, highly integrated processes can be built across geographies and companies, as a “plug-and-play” paradigm.
Same approach is required in the preclinical research in drug discovery: every lab experiment, every data generation process, every dataset generated, all must be “compatible” with all other research processes, machine learning pipelines, etc. -- across the pharmaceutical and biotech communities globally. When this tectonic shift occurs, we will witness a truly exponential change in the performance of the pharmaceutical industry, something I would call “commoditization” of preclinical research.
There is, luckily, a growing number of companies that are starting to bring about the required change in how preclinical research is done. Companies that build standardized, highly automated, scalable, and increasingly compatible laboratory facilities, guided by AI-based experiment control systems, and supplemented by AI-driven data mining and analytics capabilities. Such “next gen” lab facilities are often available remotely, making preclinical experimentation more accessible to various players in a wider scope of geographies.
Sign up now to read this story for free.