News and Media
AI algorithm classifies cancer types better than experts
Gurnit Atwal and Wei Jiao Pan-Cancer Project researchers develop deep learning system that can determine where a cancer originates with better accuracy than human experts
Gurnit Atwal and Wei Jiao

Pan-Cancer Project researchers develop deep learning system that can determine where a cancer originates with better accuracy than human experts

If doctors know where a patient’s cancer started, they can better treat the disease. Unfortunately, this is not always possible, but AI could play a role in solving that.

In a study published today in Nature Communications, a Toronto-based researcher group developed a deep learning system that can accurately classify cancers and identify where they originated based on patterns in their DNA. The system could potentially help clinicians differentiate difficult-to-classify tumours and help recommend the most appropriate treatment option for their patients.

“We reasoned that there was something within the cancer’s DNA that could help us classify these tumours,” says Dr. Quaid Morris, OICR Senior Investigator and co-lead author of the study1. “But I didn’t expect our system to work at well as it does – in some cases, far better than pathologists.”

The team

The initiative began with the dataset: 2,600 whole genomes across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes Project, also known as the Pan-Cancer Project or PCAWG.

Dr. Lincoln Stein, Head, Adaptive Oncology at OICR and member of the Pan-Cancer Project Steering Committee, and his team began to work with these data to identify patterns in a cancer’s genetic material that could help classify these tumours. To them, this was a perfect problem for AI.

When we started to collaborate, We realized we had something amazing.
– Wei Jiao

“Deep learning models excel when they’re trained on large amounts of data,” says Wei Jiao, Research Associate in the Stein Lab and co-first author of the study. “We had an incredibly large dataset to work with, the most comprehensive dataset of whole cancer genomes to date, but we also needed the machine learning expertise.”

The Stein Lab posted their progress on bioRxiv, an open-access repository for biology publications that have not yet been peer-reviewed, which in turn sparked the collaboration between his team and the Morris Lab – a group with deep machine learning expertise.

The system

The development of their deep learning system was not simple. They mined through terabytes of data looking for patterns in the type of mutations, the source of mutations and where mutations occurred in the genome, among other factors.

To their surprise, they found that patterns in driver mutations – the changes in DNA that are thought to ‘drive’ the development of cancer – were not useful in determining where the tumour originated. Instead, they found that patterns in the distribution of mutations and the type of mutation within a patient’s sample could better classify the patient’s disease.

“We knew that we could distinguish between two different types of healthy cells by looking at how the DNA within the cell types are packaged,” says Stein, who is a co-lead author of the study. “We were surprised and gratified that we could do the same using cancer cells.”

“We saw that the tightly-packaged sections – also known as the closed chromatin – would have many more mutations than the loosely wound sections,” says Gurnit Atwal, PhD Candidate in the Morris Lab and co-first author of the study. “It was like the normal cell was casting a shadow on the cancer cell, and we just had to read the shadows.”

To achieve the highest accuracy, the research group developed a deep learning neural network-based system, a type of system that is loosely modeled after the human brain and commonly used to recognize patterns in images, audio and text. Their system achieved an accuracy of 91 per cent – roughly double the accuracy that trained pathologists can achieve using traditional methods when presented with a primary tumour and no clinical information.

Further, they tested their model on an additional 2,000 tumours from patients in the Netherlands who donated their cancer genomic data to the Hartwig Medical Foundation and the system still performed with a remarkably high level of accuracy.

 “As more cancer genomes are sequenced, we can gain the ability to classify rarer cancers,” says Atwal. “Where we are now is great, but there is more work to be done.”

The potential

This study presents a deep learning system that could potentially improve how cancers are classified, enhancing the accuracy of current diagnostic tests and the treatment decisions they inform.

For some patients, this system could tell them where their cancer began, giving them valuable information about which course of treatment to choose. The system also could serve as a tool to help doctors identify whether a tumour in a patient who has been treated for cancer in the past is an entirely new tumour or a recurring tumour that has spread.

“A treatment plan for a cancer that originated in the throat may be very different than one for that originated in the breast, and the treatment for a cancer that has returned is different than for one that has metastasized,” says Atwal. “One day, our tool could help give doctors the power to distinguish these classes of tumours, giving patients valuable information that wouldn’t have been available otherwise.”

The authors of the study suggest that their system could start helping patients soon. They plan to further refine their system for patients with rare cancers before moving towards clinical studies. 

“The potential impact of the system we’ve developed is encouraging,” says Morris. “We look forward to turning this system into a tool that can help clinicians and future cancer patients tackle this disease.”


1Morris is also a Canada CIFAR AI Chair, Faculty Member at the Vector Institute, and Professor at the University of Toronto’s Donnelly Centre for Cellular and Biomolecular Research.


Related links