The Emergence of Diffusion Generative Models in Accelerating AI Drug Discovery

Recently, diffusion generative models, such as DALL-E 2 and Midjourney, have attracted considerable interest for their ability to create captivating and inventive images based on textual prompts. Researchers at MIT's Abdul Latif Jameel Clinic for Machine Learning in Health (Jameel Clinic) are now exploring the potential of these models to extend beyond generating impressive visuals, with the aim of speeding up drug development and reducing adverse side effects.

A forthcoming paper presenting a novel molecular docking model, named DiffDock, will be showcased at the 11th International Conference on Learning Representations. This cutting-edge approach to computational drug design diverges from the prevalent state-of-the-art tools used by most pharmaceutical companies, providing an opportunity to transform the conventional drug development pipeline.

What are diffusion generative models?

Diffusion generative models are a class of machine learning models that create new data samples by gradually introducing or removing noise in a series of steps. These models leverage a neural network trained to reverse the process of noise addition, allowing them to generate new data samples that resemble the original training data.

In the context of image generation, diffusion models iteratively apply random noise to a 2D image, effectively destroying the image data until it becomes indistinguishable noise. The neural network is then trained to recover the original image by reversing this noising process. Once trained, diffusion generative models can produce new data samples by starting from a random configuration and progressively removing the noise, resulting in the generation of novel images, 3D coordinates, or other data types, depending on the application.

Improving virtual screening with diffusion generative models

Drugs generally function by interacting with proteins found in our bodies or in bacteria and viruses. Molecular docking is a technique developed to understand these interactions by predicting the 3D atomic coordinates at which a ligand (i.e., drug molecule) and a protein can bind. While molecular docking has led to the successful identification of drugs for treating HIV and cancer, the average development time for a drug spans a decade, with 90% of candidates failing costly clinical trials. As a result, researchers are pursuing more rapid and efficient ways to analyze potential drug molecules.

Existing in-silico drug design tools that utilize molecular docking typically adopt a "sampling and scoring" method, seeking the optimal ligand "pose" to fit the protein pocket. This labor-intensive process evaluates a multitude of poses and scores them based on the quality of ligand-protein binding.

Previous deep-learning solutions have approached molecular docking as a regression problem, optimizing for a single target with one correct solution, says Gabriele Corso, co-author and second-year MIT PhD student in electrical engineering and computer science affiliated with MIT CSAIL. In contrast, generative modeling assumes a distribution of possible answers, which is essential when faced with uncertainty.

Hannes Stärk, co-author and first-year MIT PhD student in electrical engineering and computer science affiliated with MIT CSAIL, emphasizes that the model allows for the prediction of multiple poses, each with differing probabilities. Consequently, the model avoids committing to a single conclusion, which can often result in failure.

To grasp how diffusion generative models function, it helps to examine image-generating diffusion models. In this context, diffusion models incrementally add random noise to a 2D image in a series of steps, effectively destroying the data until it becomes indistinguishable noise. A neural network is then trained to reverse this process and recover the original image. The model can generate new data by starting from a random configuration and progressively removing the noise.

Regarding DiffDock, the model can successfully identify multiple binding sites on previously unseen proteins after being trained on a variety of ligand and protein poses. Instead of generating new image data, it creates new 3D coordinates that help the ligand identify potential angles for fitting into the protein pocket.

This "blind docking" approach paves the way for harnessing AlphaFold 2 (2020), DeepMind's acclaimed protein folding AI model. Since the debut of AlphaFold 1 in 2018, the research community has been buzzing about the potential of computationally folded protein structures from AlphaFold to aid in the identification of new drug mechanisms of action. However, state-of-the-art molecular docking tools have yet to demonstrate superior performance in binding ligands to computationally predicted structures compared to random chance.

DiffDock surpasses previous methods in traditional docking benchmarks due to its enhanced accuracy and ability to implicitly model some protein flexibility. Importantly, DiffDock maintains high performance even as other docking models begin to falter. When working with computationally generated

What are diffusion generative models?

Improving virtual screening with diffusion generative models

Get Exclusive Insights Into Your Inbox join 9000+ BPT insiders

Get Exclusive Insights Into Your Inbox
join 9000+ BPT insiders