ArtiDock from Receptor.AI: Next-generation AI Docking That Beats DiffDock and AlphaFold-latest

Receptor.AI has announced ArtiDock, the best-in-class model for “AI docking," which predicts the binding poses of small molecule ligands in protein binding pockets with unprecedented speed and accuracy.

The company performed a comprehensive comparison of ArtiDock with the best modern AI docking techniques and with the most widely used conventional docking programs, Vina and Gold.

ArtiDock not only beats the previous best-performer, DiffDock by a large margin but also performs on par with classical docking programs and rivals the recently announced next generation of AlphaFold-latest that is capable of predicting protein-ligand complexes.

How does it work?

The key to success is a proprietary technique of data augmentation combined with a fast and lightweight model architecture. ArtiDock is trained on a mixture of artificial and real complexes and is aware of a much larger set of combinations of intermolecular interactions than are present in the resolved structures. This gives it a dramatic boost in accuracy and predictive power.

Instead of relying on only about 20.000+ resolved protein-ligand complexes, millions of artificial binding pockets, which closely follow the experimentally observed statistical properties of the real ones, are generated. The algorithmic approach to such generation is currently being published.

Benchmark

The benchmark included the commonly used Astex docking dataset and the novel PoseBuster dataset, which is dedicated to challenging the quality of AI docking algorithms. The usual RMSD metric was used, as well as an additional PoseBusters-Valid set of metrics. The latter also assesses steric clashes and the conformer quality of the ligand.

The ArtiDock outperforms all its rivals (both AI and docking) with a large margin for the Astex dataset using either RMSD or PB-Valid metrics, as shown in Figure 1.

Comparative performance of the docking methods Astex Diverse set — Figure 1.

For the PoseBuster dataset, ArtiDock beats all other AI techniques, including DiffDock, and performs on par or slightly better than the docking techniques, as shown in Figure 2. While its precision approaches the AlphaFold-latest model, ArtiDock stands out with its remarkable throughput, being 600 times faster. This is backed by a blend of data augmentation and a fast, lightweight model architecture. Trained on a mix of artificial and real complexes, it’s aware of a broader range of PPI interactions and shallow pockets, making it efficient in docking even for binding interfaces lacking data on ligands and their binding modes.

Comparative performance of the docking methods PoseBuster set — Figure 2.

Figure 3 shows the comparison of individual PoseBuster structural metrics for ArtiDock, DiffDock, and AlphaFold-latest, the newly announced AlphaFold version, which is capable of predicting protein-ligand complexes. It is clearly seen that ArtiDock significantly outperforms DiffDock for all the metrics, which are not predicted with 100% accuracy. An exception is the distance to inorganic cofactors, where the results of DiffDock are marginally better. ArtiDock even outperforms AlphaFold-latest for tetrahedral chirality.

Percentage of predictions passing quality check from the PoseBusters — Figure 3.

It is worth emphasizing that although AlphaFold-latest provides the best overall prediction quality, this precision comes at the cost of extremely slow inference speed.

Figure 4 shows the inference speeds of all the studied techniques. ArtiDock is somewhat slower than some of the "quick and dirty" AI techniques, but it is still two orders of magnitude faster than DiffDock, Vina, and Gold while being superior to them in quality. The AlphaFold-latest is expectedly the worst performed here. It is three orders of magnitude slower than ArtiDock, which makes it unusable for virtual screening—the area where the technology of Receptor.AI shines.

Approximate Runtime Per Sample for Docking Methods — Figure 4.

Conclusion

The ArtiDock model has a large potential for improvement despite already being impressive. The company is actively working on further increases in quality and inference speed.

Thus, as of the present moment, the Receptor.AI ligand pose prediction model appears to be the most capable on the market in terms of the balance between prediction quality and high throughput capabilities.

Receptor.AI was featured and profiled in The State of AI in the Biopharma Industry, a landmark technology-scouting solution for the life sciences industry.