Applying AI and HPC to Drug Design: a Survey of R&D Work

The motivation for this report

In 2018, while at Supercomputing in Dallas I had a couple key encounters that profoundly influenced my work path ever since:

FIRST. At Nvidia, I saw a presentation about ANI1, to approximate molecules energy using deep learning. This is a precondition to accurately calculate dynamic and chemical parameters. The result is a dramatic reduction of time to solution, while achieving the same or better accuracy than using the exact numerical method. Five (yes, 5) orders of magnitude faster time-to-solution are demonstrated on a 54 atoms molecule than using Density Functional Theory. The neural network approximates DFT output data. But molecules vary in size while the input to the network must be of constant size, so the authors extended Behler-Parrinello functions and created special vectors that describe the input to the network.

SECOND. At the Department of Energy booth, I saw some results of a joint project between the DOE and the National Cancer Institute, on using deep learning to enable RAS - RAF protein calculations. This protein is responsible for an important number of cancer types and it interacts with the cell membrane. One way to explain the mechanism, or signaling pathway, is through an atomic level HPC calculation of the entire cell system. This is however not feasible even on exascale computers. Because of the widely different space and time scale between cell and atom, deep learning autoencoders were used to isolate patches of interest on which an atomic-scale calculation was then applied.

THIRD. At ISC 2019 I saw an application of variational autoencoders : Generative modeling of protein folding transitions with recurrent auto-encoder: they analyze 40 million atoms data in a lower dimensional space, and then extend it in the time domain without changing the simulation software. The output data from the molecular dynamic simulation (MD) is the input to the Convolutional Variational Auto-encoder. The network learns the encoding by minimizing the pixel-by pixel distance in the simulated contact matrix, and it captures the ensemble state of the simulation by minimizing the Kullback-Leibler divergence. Next, the simulation is extended in the time domain by extrapolating the state update, and by evolving the system in the feature space using a regressor function. Position and velocities are calculated in the decoder, which also calculates the error in relation to the MD simulation. The AI model infers the solution several orders of magnitude faster than using Molecular Dynamics calculations.

In all three cases, deep learning is essential to enable computations that are otherwise unfeasible. So I asked myself what is next, (and what may be in for me):

Can AI also be used for drug design?
How does AI work with quantum computing?
Can AI become a valid approximation of Density Functional Theory, or of ab-initio algorithms?
Who are the key players, including startups?
What is the analysts’ viewpoint?

The R&D efforts to deal with the pandemic are one answer to question #1 above. In 2020, there was a joint effort by the DOE, US national labs, cloud providers and some universities to tackle the virus challenge from several perspectives. What is common to cancer research: better, faster drug design.

But, before I start: why would AI be any good at drug design? It turns out, this is a complex task that human experts do not do well. It is not a task such as computer vision that humans can do, but machines can do better. The answer may be how the complex task is split up in smaller, more manageable ones. Some are examined in the next paragraphs, and a hybrid approach including AI, HPC, and potentially QC promises to be the right Ansatz.

I will go ahead and summarize R&D work that has been published between 2020 and now at the GPU Technology Conference and at the Supercomputing and International Supercomputing conferences.

Non-goals

The following topics will not be covered: omics, protein engineering, medical imaging, laboratory automation & robotics, virtual screening, virus containment and mitigation measures. Omics, protein engineering and automation may be discussed in the next document of this series because they are related to the subject matter of this survey.

Bridging the time-space scale

Doing atomic-level accurate calculations over an entire cell, or on the coronavirus spike protein is not feasible even on the largest supercomputers. Fortunately, AI comes to rescue. Only the trajectories or regions of interest are sampled for detailed numerical analysis.

Continue reading

This content available exclusively for BPT Mebmers

BPT Membership

Topics: AI & Digital

The motivation for this report

Non-goals

Bridging the time-space scale

Continue reading

Get Exclusive Insights Into Your Inbox join 8500+ BPT insiders

Get Exclusive Insights Into Your Inbox
join 8500+ BPT insiders