June 26, 2020

Opening the virtual floodgates for cancer research and discovery

OICR’s Genome Informatics team announces international release of the ICGC-ARGO Data Platform, the all-in-one data hub for the largest clinical-genomic data sharing initiative in the world

Dr. Christina Yung

We’re in the midst of an era of big data that is changing the way we understand the world – including how we study, diagnose and treat cancers. 

Improvements in sequencing technology and computational power have allowed us to collect massive amounts of information about cancer patients and their tumours. This information, however, is only powerful if it can be accessed by those who can transform big data into new discoveries. 

Over the last decade, OICR’s Genome Informatics has built a reputation for developing robust big data portals that provide cancer data access to thousands of researchers around the world. Now, the Genome Informatics team has set out to do it again – this time with bigger data. 

Continue reading – Opening the virtual floodgates for cancer research and discovery

February 5, 2020

AI algorithm classifies cancer types better than experts

Gurnit Atwal and Wei Jiao

Pan-Cancer Project researchers develop deep learning system that can determine where a cancer originates with better accuracy than human experts

If doctors know where a patient’s cancer started, they can better treat the disease. Unfortunately, this is not always possible, but AI could play a role in solving that.

In a study published today in Nature Communications, a Toronto-based researcher group developed a deep learning system that can accurately classify cancers and identify where they originated based on patterns in their DNA. The system could potentially help clinicians differentiate difficult-to-classify tumours and help recommend the most appropriate treatment option for their patients.

“We reasoned that there was something within the cancer’s DNA that could help us classify these tumours,” says Dr. Quaid Morris, OICR Senior Investigator and co-lead author of the study1. “But I didn’t expect our system to work at well as it does – in some cases, far better than pathologists.”

The team

The initiative began with the dataset: 2,600 whole genomes across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes Project, also known as the Pan-Cancer Project or PCAWG.

Dr. Lincoln Stein, Head, Adaptive Oncology at OICR and member of the Pan-Cancer Project Steering Committee, and his team began to work with these data to identify patterns in a cancer’s genetic material that could help classify these tumours. To them, this was a perfect problem for AI.

When we started to collaborate, We realized we had something amazing.
– Wei Jiao

“Deep learning models excel when they’re trained on large amounts of data,” says Wei Jiao, Research Associate in the Stein Lab and co-first author of the study. “We had an incredibly large dataset to work with, the most comprehensive dataset of whole cancer genomes to date, but we also needed the machine learning expertise.”

The Stein Lab posted their progress on bioRxiv, an open-access repository for biology publications that have not yet been peer-reviewed, which in turn sparked the collaboration between his team and the Morris Lab – a group with deep machine learning expertise.

The system

The development of their deep learning system was not simple. They mined through terabytes of data looking for patterns in the type of mutations, the source of mutations and where mutations occurred in the genome, among other factors.

To their surprise, they found that patterns in driver mutations – the changes in DNA that are thought to ‘drive’ the development of cancer – were not useful in determining where the tumour originated. Instead, they found that patterns in the distribution of mutations and the type of mutation within a patient’s sample could better classify the patient’s disease.

“We knew that we could distinguish between two different types of healthy cells by looking at how the DNA within the cell types are packaged,” says Stein, who is a co-lead author of the study. “We were surprised and gratified that we could do the same using cancer cells.”

“We saw that the tightly-packaged sections – also known as the closed chromatin – would have many more mutations than the loosely wound sections,” says Gurnit Atwal, PhD Candidate in the Morris Lab and co-first author of the study. “It was like the normal cell was casting a shadow on the cancer cell, and we just had to read the shadows.”

To achieve the highest accuracy, the research group developed a deep learning neural network-based system, a type of system that is loosely modeled after the human brain and commonly used to recognize patterns in images, audio and text. Their system achieved an accuracy of 91 per cent – roughly double the accuracy that trained pathologists can achieve using traditional methods when presented with a primary tumour and no clinical information.

Further, they tested their model on an additional 2,000 tumours from patients in the Netherlands who donated their cancer genomic data to the Hartwig Medical Foundation and the system still performed with a remarkably high level of accuracy.

 “As more cancer genomes are sequenced, we can gain the ability to classify rarer cancers,” says Atwal. “Where we are now is great, but there is more work to be done.”

The potential

This study presents a deep learning system that could potentially improve how cancers are classified, enhancing the accuracy of current diagnostic tests and the treatment decisions they inform.

For some patients, this system could tell them where their cancer began, giving them valuable information about which course of treatment to choose. The system also could serve as a tool to help doctors identify whether a tumour in a patient who has been treated for cancer in the past is an entirely new tumour or a recurring tumour that has spread.

“A treatment plan for a cancer that originated in the throat may be very different than one for that originated in the breast, and the treatment for a cancer that has returned is different than for one that has metastasized,” says Atwal. “One day, our tool could help give doctors the power to distinguish these classes of tumours, giving patients valuable information that wouldn’t have been available otherwise.”

The authors of the study suggest that their system could start helping patients soon. They plan to further refine their system for patients with rare cancers before moving towards clinical studies. 

“The potential impact of the system we’ve developed is encouraging,” says Morris. “We look forward to turning this system into a tool that can help clinicians and future cancer patients tackle this disease.”

1Morris is also a Canada CIFAR AI Chair, Faculty Member at the Vector Institute, and Professor at the University of Toronto’s Donnelly Centre for Cellular and Biomolecular Research.

Related links

February 5, 2020

Whole-genome analysis generates new insights into viruses involved in cancer

Dr. Ivan Borozan

OICR researchers scan more than 2,600 whole cancer genomes for traces of known and potentially unknown cancer-causing viruses, identifying new ways that these pathogens may eventually lead to the disease

It is estimated that viruses cause nearly 10 per cent of all cancers. These cancer-causing viruses – also known as oncoviruses – can make changes to normal cells that may eventually lead to the disease. As researchers better understand how oncoviruses cause cancer, they can develop new therapies and vaccines to prevent them from doing so.

In the most extensive exploration of cancer genomes to date, OICR researchers and collaborators discovered new insights into the mechanisms behind the seven known oncoviruses, and provided strong evidence that there are no other human cancer-causing viruses in existence.

Their study was published today in Nature Genetics, alongside more than 20 related publications from the Pan-Cancer Analysis of Whole Genomes Project, also known as the Pan-Cancer Project or PCAWG. The research group analyzed whole genome data from more than 2,600 patient tumours representing 35 different tumour types.

“The Pan-Cancer Project is one of the largest cancer genome projects to date,” says Dr. Ivan Borozan, Scientific Associate at OICR and leading co-author of the study. “This project allowed us to search for viruses in the most comprehensive collection of cancer genomes using the latest and most advanced techniques. To analyze this extensive dataset, we first had to develop computational tools and analysis pipelines that can efficiently process large-scale sequencing data and – at the same time – extract accurate information about minute amounts of the viral genome present in each individual sample. The results generated using these tools were then integrated to decipher molecular mechanisms that lead to the development of cancer.”

Our research points towards a future where these cancers can be treated more effectively, and potentially prevented in the first place.
– Dr. Ivan Borozan

The group discovered that an individual’s immune system, while trying to protect itself from a certain strain of the well-known human papillomavirus (HPV), may cause damage to normal DNA that lead to the development of bladder, head, neck and cervical cancers.

The study also found that the hepatitis B virus (HBV), which is linked to some liver cancers, causes damage in normal cells by integrating into human DNA close to TERT, a well-understood cancer-driving gene.

Spinoffs of this research initiative have led to important discoveries about the Epstein-Barr Virus (EBV) and how it can promote the development of stomach cancer.

“These findings can help us develop new vaccines or therapies that target these mechanisms,” says Borozan. “Our research points towards a future where these cancers can be treated more effectively, and potentially prevented in the first place.”

As new sequencing research initiatives emerge, the research group’s computational tools and pipelines – which are available for the research community to use – will help further explain the mechanisms behind this complex disease.

Related links