Valence Labs uses OCI to help build largest GNN in drug discovery

May 2, 2024 | 4 minute read
Dan Spellman
Global AI Cloud Director, Healthcare & Life Sciences, Oracle
Dominique Beaini
Research Unit Lead, Valence Labs, powered by Recursion
Text Size 100%:

Valence Labs, powered by Recursion, logo


Valence Labs is a research engine, powered by Recursion, committed to advancing the frontier of AI in drug discovery. Valence has been using Oracle Cloud Infrastructure (OCI) as an AI platform to help develop powerful new foundation models for drug discovery. This blog post explores just one of them, which presents the first work in scaling molecular graph neural networks (GNNs) to the billion parameters regime, with consistent performance gains on downstream tasks with increasing model scale.

In the dynamic realm of digital chemistry, advancements are accelerating at an unprecedented pace, revolutionizing drug discovery, material science, and beyond. One groundbreaking innovation in this domain is the introduction of MolGPS, a foundational GNN tailored specifically for molecular property prediction. Using the power of GNNs and the computational capabilities of a substantial GPU cluster on OCI, MolGPS can use a chemical compound’s structure to predict how that compound will be absorbed, distributed, metabolized, and excreted by the human body.     

Unveiling MolGPS

MolGPS represents a significant leap in molecular property prediction methodologies. Typically, molecular property prediction models are designed to perform a single task, such as predicting liver toxicity. In contrast, MolGPS has learned to recognize the overall patterns in how molecules behave and interact. As a result, MolGPS has achieved state-of-the-art performance on 12 out of 22 ADMET tasks from the Therapeutics Data Commons.

Built on the principles of GNNs, MolGPS excels in capturing intricate relationships and structural features within molecular graphs, enabling more accurate and efficient property predictions. Its architecture comprises of a transformer model like ChatGPT but also a message-passing to handle the complexity of molecular data. It empowers researchers and scientists with a potent tool for accelerating drug discovery pipelines, material design processes, and other critical endeavors. The following figure shows how larger MolGPS models perform better when they’re finetuned on small downstream datasets. It also demonstrates that MolGPS reaches state-of-the-art performance levels of the Therapeutics Data Commons benchmark tests, as noted by the orange line.

Comparing the normalized performance on TDC ADMET benchmark over number of parameters.

The role of AI infrastructure

At the heart of MolGPS lies what wouldn’t have been possible without OCI’s fast, responsive, and reliable compute offering. Roughly 300,000 jobs were submitted to the GPU cluster, which included the scaling to six different axes and finetuning against 34 datasets to validate Valence’s scaling hypotheses.

The scaled axes of width, depth, labels, datasets, transformer, and molecules as metrics for Valence’s hypotheses.

Compute clusters with parallel processing capabilities are tailor-made for accelerating deep learning tasks like those encountered in molecular property prediction. By harnessing the immense computational power MolGPS achieves remarkable speedups in model training, expediting the exploration of vast chemical spaces and empowering researchers to tackle complex challenges with agility and precision.

Going beyond hardware with OCI

OCI emerged as a strategic partner in this journey, offering a robust and scalable platform optimized with bare metal compute and industry-leading internode bandwidth for GPU-accelerated workloads. However, Valence’s experience has transcended the AI platform itself, including Principal Software Developer Julien St-Laurent recently commenting in the OCI and Valence support channel, “It’s been very stable, and the support here has been top-notch…we're very happy.”  The intersection of people and technology from OCI were fundamental to the success of not just building one model, but also understanding how to build the ideal model.

Conclusion

In the pursuit of unraveling the mysteries of molecular properties, MolGPS emerges as a beacon of innovation, guided by the principles of GNNs and propelled by the computational might of GPUs. With Oracle Cloud Infrastructure as a steadfast partner, Valence Labs journey towards transformative discovery became not just a possibility, but a tangible reality. As researchers continue to push the boundaries of scientific exploration, the fusion of scaled data generation, cutting-edge technologies, and scalable infrastructure paves the way for a future where molecular insights drive profound advancements across diverse domains.

To learn more, see the following resources:

Dan Spellman

Global AI Cloud Director, Healthcare & Life Sciences, Oracle

Dan leads AI Cloud engagements for healthcare and life sciences at Oracle, as well as Oracle Cloud collaborations with NVIDIA for HCLS. This aligns a life-long love of cutting-edge technology with a passion for applying that to help transform healthcare and save lives. To discuss using AI/ML & Accelerated Computing (GPU compute) for your healthcare or life science organization, please email dan.spellman@oracle.com.

Dominique Beaini

Research Unit Lead, Valence Labs, powered by Recursion


Previous Post

Announcing Oracle Database 23ai : General Availability

Dominic Giles | 16 min read

Next Post


Observability & Management services to manage a large Oracle Database fleet

Erika Sciunzi | 17 min read