OCI and New York Genome Center unveil unprecedented single-cell genome collaboration

February 5, 2024 | 3 minute read
Ruzhu Chen
Master Principal Cloud Architect, Healthcare & Life Sciences
Dan Spellman
Global AI Cloud Director, Healthcare & Life Sciences
John Zinno
Genomics Data Scientist / AI, New York Genome Center
Text Size 100%:

We want to thank Dan Landau, Tamara Prieto, Dennis Yuan, Ivan Raimondi, Catherine Potenski, Tong Zhu, Jason Fenwick, George Vacek and Sanjay Basu for their support of this collaboration.

The Landau Lab at the New York Genome Center (NYGC) and Oracle Cloud Infrastructure (OCI) have just collaborated on delivering the world’s largest human somatic phylogenies built from single-cell whole genomes, with unprecedented speed and scale.

DNA graphic

Unlocking the mosaic: Advancements in somatic variant analysis

Somatic DNA mutations accumulate throughout our lifetime in our cells, with clonal selection leading to significant mosaicism across tissues in physiological aging and disease. Understanding the range and impact of somatic variation is one of the most exciting frontiers in human genetics. However, somatic variant analysis remains highly technically challenging because of its relatively low frequency, requiring sensitive genome sequencing methods. Because each person comprises a mosaic of genomes, in theory a full characterization of somatic variation calls for complete whole-genome sequencing for every cell in the body. In this context, methods for high accuracy and cost-efficient and single-cell whole genome sequencing that’s scalable to thousands of cells can help detect and uncover the patterns of somatic variation in humans.

To address these challenges, the Landau Lab develops computational and experimental tools to study somatic evolution, at single-cell and multiomic levels. Importantly, these tools are suitable for analysis directly from human samples, allowing for the analysis of human somatic evolution over true human lifespans, which is poorly modeled using cellular or animal systems.

Single-cell phylogenetics to understand somatic evolution

The single-cell phylogenetic perspective can answer key questions in somatic evolution. To enable high resolution phylogenetic reconstruction from thousands of somatic cells, the Landau Lab automated primary template-directed amplification (PTA), a novel method that significantly reduces error rates in single-cell whole-genome sequencing (scWGS) to generate thousands of scWGS and scRNA libraries per donor. The team also adapted scWGS to Ultima Genomics sequencing ($1 USD/Gb) to achieve high depth sequencing of scWGS libraries.

By applying single-cell miniaturized automated reverse transcription and primary template-directed amplification (SMART-PTA) to primary human samples, the Landau Lab generated the largest number of single-cell genomes in the world. Record cell numbers provide unprecedented power for phylodynamics and evolutionary analyses. However, the analysis of such a large dataset requires a new solution to process hundreds of terabytes of sequencing data per donor, perform joint variant calling of thousands of samples, estimate phylogenies with thousands of taxa and hundreds of thousands of variable sites, and generate enormous simulations for evolutionary hypothesis testing.

Breaking records in single-cell whole-genome variant calling and phylogenetic reconstruction

To meet this analytical challenge, The Landau Lab used OCI’s included deep engineering engagements and industry expertise to help it build and optimize a first of its kind custom pipeline. This unique workflow utilized OCI’s bare metal A10 Quad nodes, which helped to cost-effectively achieve an approximate 50-times speed-up per cell over standard 8-core CPU instances. In addition to processing the sequencing data, the same configuration was used to achieve an approximate 130-times speed-up when building phylogenetic trees.

The team expects to further accelerate this workflow using other bare metal GPU offerings from OCI, but selected the BM.GPU.A10.4 for its lower cost per hour and robust specifications. Each of these nodes includes 96 GB of GPU memory on a system powered by Intel Xeon Platinum 8358 processors with a total of 64 OCPUs (equivalent of 128 vCPUs), 1 TB of CPU memory, 7.68 TB of local high performance NVMe storage, and two 50 Gbps network bandwidth.

Want to learn more?

To hear more about this innovative and collaborative approach to decipher somatic evolution at the highest resolution, join the AGBT annual general meeting on Thursday, February 8 in Orlando, Florida.
 

Sanjay Basu Special Thanks to Dr. Sanjay Basu for his continued support

Ruzhu Chen

Master Principal Cloud Architect, Healthcare & Life Sciences

Ruzhu is a master principal cloud architect in OCI's AIML cloud engineering team with strong hands-on expertise in large AI/ML platform and application optimization. He has 20+ years’ experience in Life Science application development, enablement, and user support as previously IBM Lead Scientist and SME in Life Science global team. He holds a PhD in microbiology (molecular biology focus) and a master in computer science.

Dan Spellman

Global AI Cloud Director, Healthcare & Life Sciences

Dan leads AI Cloud engagements for healthcare and life sciences at Oracle, as well as Oracle Cloud’s partnership with NVIDIA for HCLS. This aligns a life-long love of cutting-edge technology, with a passion for applying that to help transform healthcare and save lives. To discuss using AI/ML & Accelerated Computing (GPU compute) for your healthcare or life science organization, please email dan.spellman@oracle.com.

John Zinno

Genomics Data Scientist / AI, New York Genome Center

John is a Genomics Data Scientist at New York Genome Center & Weill Cornell Medicine working on developing tools and methods for single-cell multiomics at scale towards the understanding of somatic mosaicism in humans. John obtained a M.S. in Biochemistry and Cell Biology from Stony Brook University and a M.S. in Bioinformatics from NYU.


Previous Post

OCI announces plans to expand in Africa

Scott Twaddle | 3 min read

Next Post


Customer security best practices for Fusion application deployment

David B. Cross | 4 min read