ConstruQt – The Beginnings of the Chemical Data Revolution

Chemical Data Has Problems

The state of data access, quality and dissemination in Chemistry is extremely poor - so poor that it is blocking advances in machine learning (ML) and artificial intelligence (AI), and also impeding research and development in traditional methods. The recent surge in AI skepticism is a direct consequence of years of over-hype and promises based on precarious data. Over-the-top expectation were offered without enough consideration for the data quality and volume required to train fancy algorithms. The old adage “^&$% in, ^&$% out” holds true (we can say ‘crap’ right?). This opinion is in line with recent statements by the CEO of Novartis, for example, who runs the second largest pharmaceutical company in the world, lamenting the difficulty in accessing quality datasets to make AI effective.

ChemAlive is building a platform to solve this problem by offering advanced computational tools in exchange for academic raw and live chemical data. ConstruQt is our first freemium web application module showing how this interaction works. It provides quantum mechanical molecular energetic and structural analysis designed for library scale (big data) research and simply asks that researchers deposit their chemical structures upon submission.

Chemical Data is Damn Valuable

Let’s set the context. “Data is the new gold” is a statement of increasing significance. Scientific data as an industry is at least 20 billion USD p.a. based on the size of the peer reviewed scientific publication market. About 5 billion of that is related to Chemistry. In fact, chemical data is the most valuable scientific data based on highest frequency of journal article piracy – in other words, it is data (recipes) worth stealing. Chemistry and Electronic Engineering (EE) account together for over 50% of all patents filed. The monetary value of patents in Chemistry, based on patent sale price, far exceeds EE. Companies distributing published chemical data like Scifinder, Reaxys are only aggregators of the hard-earned, high-value data of, mostly, academic researchers, but they also act as gatekeepers to organized chemical information, with quite an upside for themselves. Public initiatives like Pubchem and Chemspider do not have the same pizazz as their business counterparts, but are widely used to retrieve basic chemical information. Government initiatives like NIST can be important tools as well. However, there is a problem. None of these platforms (commercial or public) are built for big data analytics. Their business models or data architectures do not allow a non-sparse connection between large curated datasets of molecular properties connected directly to molecular structure freely searchable.

A Walk through the Chemical Data Freak Show

Continue reading

This content available exclusively for BPT Mebmers

BPT Membership

Topics: AI & Digital

Chemical Data Has Problems

Chemical Data is Damn Valuable

A Walk through the Chemical Data Freak Show

Continue reading

Get Exclusive Insights Into Your Inbox join 9000+ BPT insiders

Get Exclusive Insights Into Your Inbox
join 9000+ BPT insiders