Over the last five or so years, the drug discovery industry has started adopting artificial intelligence (AI) at unprecedented scale, with pretty much every big and small pharma company doing some kind of pilots or more substantial projects having some AI component in it – from machine learning algorithms and deep learning networks to natural language processing models. Technology proved to have such a fundamental impact on performance of drug discovery work, that we now see a wave of young companies – sometimes referred to as "digital biotech" – which have a whole new business model revolving around the platform-based process of innovation. Some companies have "end-to-end" drug design platforms capable of automatically doing not only concept creation and target discovery, but also hit discovery, part of lead optimization work, and even predicting clinical trial outputs and identifying clinically-relevant biomarkers.
Receptor.AI is one of the companies at the forefront of "digital biology" movement, having built a modular AI-based discovery platform, aimed at fast and efficient target and lead discovery. While the research has been going on for quite some time, the company has been launched last year and already raised seed round.
I have started this column and will be sharing insights about how our company is re-imagining the field of computational drug design. In this series of posts, I am going to be discussing some of the solutions we have developed and case studies where we demonstrate how our AI system is superiour to legacy approaches, and how it is competing with other players on the market.
Today, let's talk about molecular descriptors.
Exploration of the chemical space of small molecules, which is vital for modern drug discovery projects, requires uniform, efficient and machine-friendly representation of molecular structures. There are dozens of molecular descriptors and fingerprints, which encode the molecular graph, composed from atoms and bonds, into a vector of numbers, which is further processed by cheminformatics algorithms and AI models. Different descriptors prioritize different molecular properties, which means that multiple descriptors should be combined together in order to keep as much information as possible. This is especially important for the ML algorithms, which are “hungry” for data and perform better if more information is provided. However, lumping together many molecular descriptors increases data dimensionality and makes the learning process extremely slow and computationally demanding.
The team at Receptor.AI recently came up with an elegant solution of this problem by introducing very compact and extremely informative molecular descriptors extracted from the so-called latent space of the Variational Autoencoder neural network.
The Variational Autoencoder (VAE) architecture is well known and widely used in the machine learning community for classification problems, feature detection and generative tasks. The latter exploits the encoder-decoder mode, where the training data are encoded into the internal representation (called the latent space), mixed with random noise and then decoded to generate new data, which are statistically similar to the training dataset.
The latent space of VAE is usually considered as pretty much useless intermediate representation, but researchers from Receptor.AI have shown that it could be used to obtain extremely powerful and efficient molecular descriptors. They have trained a VAE network on the chemical structures of 10M molecules from the public ZINC12 database and allowed the network to generate an encoded latent space. Then compact vectors, containing only 128 numbers, were extracted from the innermost layer of the neural network and used as the stand-alone molecular descriptors.
It appears that such lightweight descriptors encode as much information about the molecules as a combination of several much heavier molecular representations and fingerprints. Their usage dramatically increases the performance of all ML tasks based on such representation without any detectable loss of quality.
The company has tested its findings by generating new molecules from such descriptors using the VAE in decoder mode. Obtained molecules had excellent quality metrics in terms of chemical correctness and novelty.
Currently deep VAE descriptors are used as a core technology in Receptor.AI. They are computed for all molecules in explored chemical space and then processed by the virtual screening, drug-target interactions and ADME-Tox modules of the company’s AI drug discovery platform.
I would be happy to discuss this topic in greater details with potential end-users, please, drop me a line in Linkedin.
Check our website for more details: https://receptor.ai/