October 15, 2020

Q&A with Morgan Taschuk, OICR’s new Director of Genome Sequence Informatics

Morgan Taschuk, OICR’s Director of Genome Sequence Informatics (JP Moczulski/CP Images).

Morgan Taschuk reflects on a decade of supporting critical cancer research and on her new role as OICR’s Director of Genome Sequence Informatics

Cancer genomics research depends on infrastructure and analysis tools that collect, process, analyze and annotate vast quantities of valuable sequencing data. Behind these systems at OICR is a team of individuals dedicated to enabling cancer research discoveries. OICR is proud to announce that Morgan Taschuk will now lead this essential team as Director of Genome Sequence Informatics (GSI).

Here, she reflects on her new role and her outlook on the next few years.

Your behind-the-scenes work is essential to the cancer research we do at OICR. How would you describe your work?

Our team makes sequencing data analysis and management easier for researchers and clinicians in several different ways. We create and maintain the infrastructure and computational tools that researchers need to process, analyze and annotate sequencing data, so they can spend their time working on other challenging research questions.

GSI also offers expert bioinformatics support services directly to researchers to collaborate on challenging research projects. In addition, OICR Genomics is pursuing clinical accreditation this year and so we have a team of clinical genome interpreters who can issue reports on a patient’s unique genome. There’s a lot going on!

How has your work evolved since you’ve been at OICR?

I began at OICR nearly a decade ago, when most of our bioinformatics work was custom for every project, and our sequencing instruments produced a fraction of the data of instruments today. Any kind of automation was quite limited and the amount of analysis we could scale up was limited by the number of people we could hire. Back then, the biggest projects were analyzing human whole genomes for research and participating in international consortia.

In the last nine years, I’ve seen us scale up our sequencing and analysis capabilities, expand a fantastic team of highly educated experts from computer science through data analysts to clinical genome interpreters, and reinforce our reputation of excellence in bioinformatics and computational biology. We’re doing everything we were doing a decade ago plus much more. We sequence single cells, cell free and circulating tumour DNA, analyze immune profiles, participate in international consortia like ICGC-ARGO, contribute to SARS-CoV-2 projects, and will soon produce clinically accredited genome reports, all while still sequencing whole genomes for research.

Today, our team oversees around two petabytes of data, and runs about 3,000 workflows and about 1.8 terabytes of analysis, per day. Genome Sequence Informatics is a team of about 20 people, including bioinformaticians, software developers and engineers, currently supporting 155 research projects – and their resulting research discoveries – each year.

As the Director of Genome Sequence Informatics, what are you most looking forward to?

This new role allows me to focus on the bigger picture rather than on technical challenges. I see this as an opportunity to unify efforts across teams, departments, and across the institution. For example, if one team that we support is doing a task differently than another, I can help bring them together to work towards a common solution for everyone so we can learn from each other, maintain more consistent quality control, and make the best use of the resources and funds we have. I’m looking forward to more productive interactions with the phenomenal teams at OICR and with other organizations around the country and world.

What are your top priorities over the next couple of years?

My goal over the next few years is to share what we’ve created with the community while growing our network. We’ll continue to create solutions to fulfill the evolving needs of researchers and clinicians. We’ll continue to publish our code so other bioinformaticians can confirm what we’ve done and start their own analysis pipelines. We’ll publish protocols and guidelines that we’ve created for our clinically accredited analysis as well as our core assays. And we’ll share our challenges and solutions with the community so we can build on our collective expertise. In addition, we’ll reach out to other teams and organizations to collaborate and learn from them. The bioinformatics, software development, and clinical genomics communities have vast knowledge that we want to take advantage of, improve and share.

The next couple of years for GSI are going to be about collaborative, open science so the scientific and clinical communities can all benefit from the progress made by Genome Sequence Informatics and OICR as a whole.

February 5, 2020

Unprecedented exploration generates most comprehensive map of cancer genomes charted to date

Pan-Cancer Project discovers causes of previously unexplained cancers, pinpoints cancer-causing events and zeroes in on mechanisms of development 

Toronto – (February 5, 2020) An international team has completed the most comprehensive study of whole cancer genomes to date, significantly improving our fundamental understanding of cancer and signposting new directions for its diagnosis and treatment.

The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Project (PCAWG), known as the Pan-Cancer Project, a collaboration involving more than 1,300 scientists and clinicians from 37 countries, analyzed more than 2,600 genomes of 38 different tumour types, creating a huge resource of primary cancer genomes. This was then the launch-point for 16 working groups studying multiple aspects of cancer’s development, causation, progression and classification. 

Previous studies focused on the 1 per cent of the genome that codes for proteins, analogous to mapping the coasts of the continents. The Pan-Cancer Project explored in considerably greater detail the remaining 99 per cent of the genome, including key regions that control switching genes on and off — analogous to mapping the interiors of continents versus just their coastlines.

The Pan-Cancer Project has made available a comprehensive resource for cancer genomics research, including the raw genome sequencing data, software for cancer genome analysis, and multiple interactive websites exploring various aspects of the Pan-Cancer Project data.

The Pan-Cancer Project extended and advanced methods for analyzing cancer genomes which included cloud computing, and by applying these methods to its large dataset, discovered new knowledge about cancer biology and confirmed important findings of previous studies. In 23 papers published today in Nature and its affiliated journals, the Pan-Cancer Project reports that:

  • The cancer genome is finite and knowable, but enormously complicated. By combining sequencing of the whole cancer genome with a suite of analysis tools, we can characterize every genetic change found in a cancer, all the processes that have generated those mutations, and even the order of key events during a cancer’s life history.
  • Researchers are close to cataloguing all of the biological pathways involved in cancer and having a fuller picture of their actions in the genome. At least one causal mutation was found in virtually all of the cancers analyzed and the processes that generate mutations were found to be hugely diverse — from changes in single DNA letters to the reorganization of whole chromosomes. Multiple novel regions of the genome controlling how genes switch on and off were identified as targets of cancer-causing mutations.
  • Through a new method of “carbon dating, Pan-Cancer researchers discovered that it is possible to identify mutations which occurred years, sometimes even decades, before the tumour appears. This opens, theoretically, a window of opportunity for early cancer detection. 
  • Tumour types can be identified accurately according to the patterns of genetic changes seen throughout the genome, potentially aiding the diagnosis of a patient’s cancer where conventional clinical tests could not identify its type. Knowledge of the exact tumour type could also help tailor treatments.

“The incredible work of the Pan-Cancer Project team that was unveiled today is the culmination of a remarkable international collaboration that has enriched our understanding and provided new ways to approach the prevention, diagnosis and treatment of cancer,” said The Honourable Ross Romano, Ontario’s Minister of Colleges and Universities. “I congratulate the entire research group on this ground-breaking achievement in cancer research. Ontarians can be proud of the leading role OICR played in this initiative.”

“The findings we have shared with the world today are the culmination of an unparalleled, decade-long collaboration that explored the entire cancer genome,” says Dr. Lincoln Stein, member of the Project steering committee and Head of Adaptive Oncology at the Ontario Institute for Cancer Research (OICR). “With the knowledge we have gained about the origins and evolution of tumours, we can develop new tools to detect cancer earlier, develop more targeted therapies and treat patients more successfully.”

“The Pan-Cancer Project has generated a much-needed deeper understanding of the biology of cancer and how the elusive and untapped “dark matter” in the human genome drives cancer,” says Dr. Laszlo Radvanyi, OICR’s President and Scientific Director. “These discoveries can lead to totally new area of targets for cancer therapy. It is gratifying to know that OICR helped to lead the international effort, while also integrating a collaborative network of Ontario researchers to play a leading role in this global project. It is a further indication of the value of our strategic investments into data infrastructure, research and informatics expertise, as well as the value the Ontario government continues to create in supporting OICR. I congratulate Dr. Stein, his team and all Pan-Cancer researchers on this landmark achievement.”



More information

Nature landing page – https://www.nature.com/collections/pcawg/
ICGC – International Cancer Genome Consortium (https://icgc.org/)
TCGA – The Cancer Genome Atlas (https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga)
PCAWG – PanCancer Analysis of Whole Genomes (dcc.icgc.org/pcawg)
UCSC – University of California Santa Cruz (pcawg.xenahubs.net)
Expression Atlas (www.ebi.ac.uk/gxa/home)
PCAWG-Scout (pcawgscout.bsc.es)
Chromothripsis Explorer (compbio.med.harvard.edu/chromothripsis)
COSMIC – Catalogue of Somatic Mutations in Cancer (https://cancer.sanger.ac.uk/cosmic)

About the Ontario Institute for Cancer Research

OICR is a collaborative, not-for-profit research institute funded by the Government of Ontario. We conduct and enable high-impact translational cancer research to accelerate the development of discoveries for patients around the world while maximizing the economic benefit of this research for the people of Ontario. For more information visit www.oicr.on.ca.

Media contact

Hal Costie
Ontario Institute for Cancer Research

Related links

February 5, 2020

AI algorithm classifies cancer types better than experts

Gurnit Atwal and Wei Jiao

Pan-Cancer Project researchers develop deep learning system that can determine where a cancer originates with better accuracy than human experts

If doctors know where a patient’s cancer started, they can better treat the disease. Unfortunately, this is not always possible, but AI could play a role in solving that.

In a study published today in Nature Communications, a Toronto-based researcher group developed a deep learning system that can accurately classify cancers and identify where they originated based on patterns in their DNA. The system could potentially help clinicians differentiate difficult-to-classify tumours and help recommend the most appropriate treatment option for their patients.

“We reasoned that there was something within the cancer’s DNA that could help us classify these tumours,” says Dr. Quaid Morris, OICR Senior Investigator and co-lead author of the study1. “But I didn’t expect our system to work at well as it does – in some cases, far better than pathologists.”

The team

The initiative began with the dataset: 2,600 whole genomes across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes Project, also known as the Pan-Cancer Project or PCAWG.

Dr. Lincoln Stein, Head, Adaptive Oncology at OICR and member of the Pan-Cancer Project Steering Committee, and his team began to work with these data to identify patterns in a cancer’s genetic material that could help classify these tumours. To them, this was a perfect problem for AI.

When we started to collaborate, We realized we had something amazing.
– Wei Jiao

“Deep learning models excel when they’re trained on large amounts of data,” says Wei Jiao, Research Associate in the Stein Lab and co-first author of the study. “We had an incredibly large dataset to work with, the most comprehensive dataset of whole cancer genomes to date, but we also needed the machine learning expertise.”

The Stein Lab posted their progress on bioRxiv, an open-access repository for biology publications that have not yet been peer-reviewed, which in turn sparked the collaboration between his team and the Morris Lab – a group with deep machine learning expertise.

The system

The development of their deep learning system was not simple. They mined through terabytes of data looking for patterns in the type of mutations, the source of mutations and where mutations occurred in the genome, among other factors.

To their surprise, they found that patterns in driver mutations – the changes in DNA that are thought to ‘drive’ the development of cancer – were not useful in determining where the tumour originated. Instead, they found that patterns in the distribution of mutations and the type of mutation within a patient’s sample could better classify the patient’s disease.

“We knew that we could distinguish between two different types of healthy cells by looking at how the DNA within the cell types are packaged,” says Stein, who is a co-lead author of the study. “We were surprised and gratified that we could do the same using cancer cells.”

“We saw that the tightly-packaged sections – also known as the closed chromatin – would have many more mutations than the loosely wound sections,” says Gurnit Atwal, PhD Candidate in the Morris Lab and co-first author of the study. “It was like the normal cell was casting a shadow on the cancer cell, and we just had to read the shadows.”

To achieve the highest accuracy, the research group developed a deep learning neural network-based system, a type of system that is loosely modeled after the human brain and commonly used to recognize patterns in images, audio and text. Their system achieved an accuracy of 91 per cent – roughly double the accuracy that trained pathologists can achieve using traditional methods when presented with a primary tumour and no clinical information.

Further, they tested their model on an additional 2,000 tumours from patients in the Netherlands who donated their cancer genomic data to the Hartwig Medical Foundation and the system still performed with a remarkably high level of accuracy.

 “As more cancer genomes are sequenced, we can gain the ability to classify rarer cancers,” says Atwal. “Where we are now is great, but there is more work to be done.”

The potential

This study presents a deep learning system that could potentially improve how cancers are classified, enhancing the accuracy of current diagnostic tests and the treatment decisions they inform.

For some patients, this system could tell them where their cancer began, giving them valuable information about which course of treatment to choose. The system also could serve as a tool to help doctors identify whether a tumour in a patient who has been treated for cancer in the past is an entirely new tumour or a recurring tumour that has spread.

“A treatment plan for a cancer that originated in the throat may be very different than one for that originated in the breast, and the treatment for a cancer that has returned is different than for one that has metastasized,” says Atwal. “One day, our tool could help give doctors the power to distinguish these classes of tumours, giving patients valuable information that wouldn’t have been available otherwise.”

The authors of the study suggest that their system could start helping patients soon. They plan to further refine their system for patients with rare cancers before moving towards clinical studies. 

“The potential impact of the system we’ve developed is encouraging,” says Morris. “We look forward to turning this system into a tool that can help clinicians and future cancer patients tackle this disease.”

1Morris is also a Canada CIFAR AI Chair, Faculty Member at the Vector Institute, and Professor at the University of Toronto’s Donnelly Centre for Cellular and Biomolecular Research.

Related links

February 5, 2020

Whole-genome analysis generates new insights into viruses involved in cancer

Dr. Ivan Borozan

OICR researchers scan more than 2,600 whole cancer genomes for traces of known and potentially unknown cancer-causing viruses, identifying new ways that these pathogens may eventually lead to the disease

It is estimated that viruses cause nearly 10 per cent of all cancers. These cancer-causing viruses – also known as oncoviruses – can make changes to normal cells that may eventually lead to the disease. As researchers better understand how oncoviruses cause cancer, they can develop new therapies and vaccines to prevent them from doing so.

In the most extensive exploration of cancer genomes to date, OICR researchers and collaborators discovered new insights into the mechanisms behind the seven known oncoviruses, and provided strong evidence that there are no other human cancer-causing viruses in existence.

Their study was published today in Nature Genetics, alongside more than 20 related publications from the Pan-Cancer Analysis of Whole Genomes Project, also known as the Pan-Cancer Project or PCAWG. The research group analyzed whole genome data from more than 2,600 patient tumours representing 35 different tumour types.

“The Pan-Cancer Project is one of the largest cancer genome projects to date,” says Dr. Ivan Borozan, Scientific Associate at OICR and leading co-author of the study. “This project allowed us to search for viruses in the most comprehensive collection of cancer genomes using the latest and most advanced techniques. To analyze this extensive dataset, we first had to develop computational tools and analysis pipelines that can efficiently process large-scale sequencing data and – at the same time – extract accurate information about minute amounts of the viral genome present in each individual sample. The results generated using these tools were then integrated to decipher molecular mechanisms that lead to the development of cancer.”

Our research points towards a future where these cancers can be treated more effectively, and potentially prevented in the first place.
– Dr. Ivan Borozan

The group discovered that an individual’s immune system, while trying to protect itself from a certain strain of the well-known human papillomavirus (HPV), may cause damage to normal DNA that lead to the development of bladder, head, neck and cervical cancers.

The study also found that the hepatitis B virus (HBV), which is linked to some liver cancers, causes damage in normal cells by integrating into human DNA close to TERT, a well-understood cancer-driving gene.

Spinoffs of this research initiative have led to important discoveries about the Epstein-Barr Virus (EBV) and how it can promote the development of stomach cancer.

“These findings can help us develop new vaccines or therapies that target these mechanisms,” says Borozan. “Our research points towards a future where these cancers can be treated more effectively, and potentially prevented in the first place.”

As new sequencing research initiatives emerge, the research group’s computational tools and pipelines – which are available for the research community to use – will help further explain the mechanisms behind this complex disease.

Related links