How Data-sharing Technologies Bolster AI Progress in Medical Research

by Andrii Buvailo, PhD          Biopharma insight

Disclaimer: All opinions expressed by Contributors are their own and do not represent those of their employers, or
Contributors are fully responsible for assuring they own any required copyright for any content they submit to This website and its owners shall not be liable for neither information and content submitted for publication by Contributors, nor its accuracy.

Topics: Emerging Technologies   
Share:   Share in LinkedIn  Share in Reddit  Share in X  Share in Hacker News  Share in Facebook  Send by email

While artificial intelligence (AI) already proved to be a groundbreaking thing in many industries (robotics, finance, surveillance, cyber security, self-driving cars to name just a few), the pharmaceutical industry is yet to enjoy the full scale AI-driven transformation. Some companies did manage to demonstrate the power of artificial intelligence for drug discovery and basic biology research, including those of Moderna (accelerated discovery of mRNA vaccines), Insilico Medicine (accelerated small molecule discovery, 8 drug candidates in 2 years, including novel targets), Recursion Pharmaceuticals (a diverse preclinical/clinical pipeline of drug candidates enabled by AI and robotic labs), Deep Mind (major advancement in solving protein folding and 3D structures of large protein complexes using AI) etc. Pretty much every big and small company in pharma/biotech are “experimenting” with AI technologies, but the fact of today is,  the industry on the whole is quite far away from being what we may call “AI-centric” or “AI-first”, unlike, for example, the industry of internet technologies and software. A major reason for that is the lack of quality data to train large scale deep learning models properly to achieve sufficient generalizability of AI models.

Image credit: Olemedia iStock


It might seem surprising, as pharmaceutical research generates enormous amounts of data daily. But when you consider the degree of secrecy and protectionism that competing pharmaceutical giants put on their research, and the ever growing push of governments and regulatory bodies towards personal and especially medical data protection, it becomes clear that the majority of data is actually not available for the AI practitioners to do their research. Valuable data is dispersed across thousands of organizations -- research and medical -- hidden behind their firewalls. Decades of screening, testing and validation research, decades of clinical trials, enormous amounts of patient medical data hidden in local hospitals, private EHRs, etc. -- the access to such data for AI training purposes can improve not only the ability to model biology and understand disease mechanisms better, but also create more robust biomarkers, better match patients with relevant treatment options in clinical trials, and offer more robust and better validated high-throughput diagnostics tools (e.g. analysis of radiology images using AI). Data shortage leads to the situation when medical AI models are oftentimes trained on poorly diversified data (e.g. only a specific geography of patients), leading to biases in models, and poor real-world performance.

Data generation is important, but just as important are technologies allowing access to such data in a manner that is, on the one hand, feasible for building machine learning pipelines, but on the other hand -- would meet all the regulatory and commercial secret requirements typically applied manipulating all sensitive data nowadays. Even more challenging this becomes when we talk about real-world/real-time data access requirements, when the models have to be able to respond and adjust to the real-life events “on the fly” and output relevant predictions quickly.

Indeed, according to insights from the World Economic Forum 2021, 76% of executives across industries believe new ways of collaborating with ecosystem partners, third-party organizations, and even competitors, are essential to innovation in the “era of big data”. Experts also predict that secure data-sharing capabilities could better position organizations to monetize their own data as well.

In its recent report, Deloitte predicts that by the end of 2023, a significant number of healthcare organizations will be exploring opportunities to accomplish business goals utilizing artificial intelligence-driven analysis of data, provided by other organizations via specialized secure data sharing mechanisms, preserving data privacy and commercial secrets.  


Sharing is caring

Continue reading

This content available exclusively for BPT Mebmers

Topics: Emerging Technologies   

Share:   Share in LinkedIn  Share in Reddit  Share in X  Share in Hacker News  Share in Facebook  Send by email