February 5, 2020
Pan-Cancer Project researchers develop deep learning system that can determine where a cancer originates with better accuracy than human experts
If doctors know where a patient’s cancer started, they can better treat the disease. Unfortunately, this is not always possible, but AI could play a role in solving that.
In a study published today in Nature Communications, a Toronto-based researcher group developed a deep learning system that can accurately classify cancers and identify where they originated based on patterns in their DNA. The system could potentially help clinicians differentiate difficult-to-classify tumours and help recommend the most appropriate treatment option for their patients.
“We reasoned that there was something within the cancer’s DNA that could help us classify these tumours,” says Dr. Quaid Morris, OICR Senior Investigator and co-lead author of the study1. “But I didn’t expect our system to work at well as it does – in some cases, far better than pathologists.”
The initiative began with the dataset: 2,600 whole genomes across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes Project, also known as the Pan-Cancer Project or PCAWG.
Dr. Lincoln Stein, Head, Adaptive Oncology at OICR and member of the Pan-Cancer Project Steering Committee, and his team began to work with these data to identify patterns in a cancer’s genetic material that could help classify these tumours. To them, this was a perfect problem for AI.
When we started to collaborate, We realized we had something amazing.
– Wei Jiao
“Deep learning models excel when they’re trained on large amounts of data,” says Wei Jiao, Research Associate in the Stein Lab and co-first author of the study. “We had an incredibly large dataset to work with, the most comprehensive dataset of whole cancer genomes to date, but we also needed the machine learning expertise.”
The Stein Lab posted their progress on bioRxiv, an open-access repository for biology publications that have not yet been peer-reviewed, which in turn sparked the collaboration between his team and the Morris Lab – a group with deep machine learning expertise.
The development of their deep learning system was not simple. They mined through terabytes of data looking for patterns in the type of mutations, the source of mutations and where mutations occurred in the genome, among other factors.
To their surprise, they found that patterns in driver mutations – the changes in DNA that are thought to ‘drive’ the development of cancer – were not useful in determining where the tumour originated. Instead, they found that patterns in the distribution of mutations and the type of mutation within a patient’s sample could better classify the patient’s disease.
“We knew that we could distinguish between two different types of healthy cells by looking at how the DNA within the cell types are packaged,” says Stein, who is a co-lead author of the study. “We were surprised and gratified that we could do the same using cancer cells.”
“We saw that the tightly-packaged sections – also known as the closed chromatin – would have many more mutations than the loosely wound sections,” says Gurnit Atwal, PhD Candidate in the Morris Lab and co-first author of the study. “It was like the normal cell was casting a shadow on the cancer cell, and we just had to read the shadows.”
To achieve the highest accuracy, the research group developed a deep learning neural network-based system, a type of system that is loosely modeled after the human brain and commonly used to recognize patterns in images, audio and text. Their system achieved an accuracy of 91 per cent – roughly double the accuracy that trained pathologists can achieve using traditional methods when presented with a primary tumour and no clinical information.
Further, they tested their model on an additional 2,000 tumours from patients in the Netherlands who donated their cancer genomic data to the Hartwig Medical Foundation and the system still performed with a remarkably high level of accuracy.
“As more cancer genomes are sequenced, we can gain the ability to classify rarer cancers,” says Atwal. “Where we are now is great, but there is more work to be done.”
This study presents a deep learning system that could potentially improve how cancers are classified, enhancing the accuracy of current diagnostic tests and the treatment decisions they inform.
For some patients, this system could tell them where their cancer began, giving them valuable information about which course of treatment to choose. The system also could serve as a tool to help doctors identify whether a tumour in a patient who has been treated for cancer in the past is an entirely new tumour or a recurring tumour that has spread.
“A treatment plan for a cancer that originated in the throat may be very different than one for that originated in the breast, and the treatment for a cancer that has returned is different than for one that has metastasized,” says Atwal. “One day, our tool could help give doctors the power to distinguish these classes of tumours, giving patients valuable information that wouldn’t have been available otherwise.”
The authors of the study suggest that their system could start helping patients soon. They plan to further refine their system for patients with rare cancers before moving towards clinical studies.
“The potential impact of the system we’ve developed is encouraging,” says Morris. “We look forward to turning this system into a tool that can help clinicians and future cancer patients tackle this disease.”
1Morris is also a Canada CIFAR AI Chair, Faculty Member at the Vector Institute, and Professor at the University of Toronto’s Donnelly Centre for Cellular and Biomolecular Research.
- Unprecedented exploration generates most comprehensive map of cancer genomes charted to date
- New clues to cancer in the genome’s other 99 per cent
- AI algorithm classifies cancer types better than experts
- Discovering cancer’s vulnerabilities: The whole may be greater than the sum of its parts
- Finding the roots of cancer, ‘It’s a needle in a haystack’
- Dr. Lincoln Stein talks about the Pan-Cancer Project
- Unraveling the story behind the cancers we can’t explain
- TrackSig: Unlocking the history of cancer
- New tumour-driving mutations discovered in the under-explored regions of the cancer genome
September 3, 2019
OICR is proud to welcome Dr. Parisa Shooshtari as an OICR Investigator.
Shooshtari specializes in developing computational, statistical and machine learning methods to understand the biological mechanisms underlying complex diseases, like cancer and autoimmune conditions. She is interested in uncovering how genes are dysregulated in complex diseases by integrating multiple data types and applying machine learning methods to analyze single-sell sequencing data.
Of her many achievements, Shooshtari developed a computational pipeline to uniformly process more than 800 epigenomic data samples from different international consortia. She then built and led a team that developed a web-interface and an interactive genome-browser to make the database publicly available to download and explore.
Shooshtari joins the OICR community with research experience from Yale University and the Broad Institute of MIT and Harvard. She also served as a Research Associate with the Centre for Computational Medicine at the Hospital for Sick Children (SickKids).
Shooshtari recently became an Assistant Professor in the Schulich School of Medicine and Dentistry at Western University, where she officially began her career as an independent researcher. Here, Shooshtari discusses her commitment to collaboration and her transition to professorship.
Your work spans multiple disease areas from autoimmune diseases to cancer, what do these diseases have in common? Is there a specific disease that you’re more interested in?
My work focuses on complex diseases, where instead of one gene causing the disease, there are sometimes tens or hundreds of genes working together to give rise to an ailment.
When it comes to complex diseases, we also know that there are multiple factors that we need to consider, including genetics, epigenetics and environmental factors. We live in an era where we have rich datasets with many different types of data. Each of these data types sheds light upon a different aspect of the disease mechanism, but we need to integrate these data types to gain a comprehensive understanding of how a complex disease works.
I develop computational methods for integrative analysis, so complex diseases are definitely the most interesting to me. I feel lucky to be a researcher at this time when I can help bring these data types together to understand mechanisms of diseases, which in turn will help inform treatment selection or help find new therapeutic strategies.
I am interested in applying our data integration methods to several complex diseases but I am currently working with a few Canadian groups to help better understand Diffuse Intrinsic Pontine Glioma (DIPG) – a type of fatal childhood brain cancer.
Your current collaborators include researchers from Yale, Harvard, MIT, SickKids and other leading organizations. How did you initiate and sustain these collaborations?
At the beginning of my research career, I would reach out to scientists who were working on interesting, challenging and cutting-edge problems. I enjoy working in collaborative environments because I believe the key to success in biomedical research is through collaborations between researchers from diverse backgrounds.
With the support of my collaborators, I’ve been able to learn and shift my focus from theoretical computational sciences to applications of data science in genetics of complex diseases. Now, sometimes collaborators approach me with their rich data, which I’m eager to help analyze.
With your new appointment, what are you looking forward to over the next few years?
I am eager to continue expanding my research program and working with new scientists on exciting cutting-edge problems in genetics and epigenetics of complex diseases. New technologies have revolutionized how we study diseases, and we are transitioning to a point where these new technologies are revolutionizing how we treat diseases. I am confident that we will have better ways of treating these diseases in the future using personalized medicine, and I want to help make that a reality.
April 19, 2018
Largest-ever study of its kind uses a tumour’s past to accurately predict its future
Toronto (April 19, 2018) – Findings from Canadian Prostate Cancer Genome Network (CPC-GENE) researchers and their collaborators, published today in Cell, show that the aggressiveness of an individual prostate cancer can be accurately assessed by looking at how that tumour has evolved. This information can be used to determine what type and how much treatment should be given to each patient, or if any is needed at all.
The researchers analyzed the whole genome sequences of 293 localized prostate cancer tumours, linked to clinical outcome data. These were then further analyzed using machine learning, a type of statistical technique, to infer the evolutionary past of a tumour and to estimate its trajectory. They found that those tumours that had evolved to have multiple types of cancer cells, or subclones, were the most aggressive. Fifty-nine per cent of tumours in the study had this genetic diversity, with 61 per cent of those leading to relapse following standard therapy.
February 21, 2018
Investment supports emerging entrepreneurial scientists and critical proof-of-principle studies
TORONTO, ON (February 20, 2018) – FACIT, a business accelerator, announced four new recipients of funding through its Prospects oncology investment competition: Dalriada Therapeutics Inc. (“Dalriada”), 16-Bit Inc. (“16-Bit”), a cancer biomarker study at the Ontario Institute for Cancer Research (“OICR”), and a virus-based therapeutic under development at the Ottawa Hospital and the University of Ottawa. FACIT’s investments are imperative in bridging the capital gap often experienced by early-stage Ontario companies, helping corporations establish jobs and build roots in the province. The wide ranging scope of the innovations, which span therapeutics, machine learning and biomarker development, reflect the rich talent pool within the Ontario oncology research community.
October 4, 2017
New software uses machine learning to identify mutations in tumours without reference tissue samples
One of the main steps in analyzing cancer genomic data is to find somatic mutations, which are non-hereditary changes in DNA that may give rise to cancer. To identify these mutations, researchers will often sequence the genome of a patient’s tumour as well as the genome of their normal tissue and compare the results. But what if normal tissue samples aren’t available?
January 10, 2017
Prostate cancer is the most common cancer in Canadian men, but there is still no one-size-fits-all strategy for treating the disease. Currently it is difficult to choose exactly the right type and amount of treatment for each individual because it is hard to accurately assess how aggressive the cancer is. Researchers are now a step closer to bringing a powerful new prognostic tool into clinical use.
January 9, 2017
A team of researchers and clinician-scientists from across Canada have discovered a signature of 41 mutations that are common in prostate cancer and will help to prevent patients with non-aggressive disease from being overtreated. Dr. Paul Boutros, a Principal Investigator in OICR’s Informatics and Bio-computing Program and Co-Lead of the Canadian Prostate Cancer Genome Network (CPC-GENE), answered a few questions about how the signature was developed and its potential impact on patients.
October 28, 2016
Dr. Matt Cecchini was one of many pathologists and researchers, including 21 trainees, to attend the inaugural Pathology Matters meeting hosted by the Ontario Molecular Pathology Research Network (OMPRN). In this post he covers what he learned at the meeting, where the field is going and how that impacts his training and research.