July 20, 2016

A challenge to the community to improve RNA sequencing data

rna-seq

A new DREAM Challenge was launched on June 30 focusing on the abnormal RNA molecules in cancer cells. The ICGC-TCGA DREAM SMC-RNA Challenge is an international effort designed to improve standard methods for identifying cancer-associated rearrangements in RNA sequencing (RNA-seq) data, providing new tools for cancer researchers. Improved RNA sequencing data will allow researchers to better understand cancer leading to new and better-personalized approaches to cancer treatment.

The Challenge is open to the entire research community, and anyone interested in participating is encouraged to register at https://synapse.org/SMC_RNA.

Continue reading – A challenge to the community to improve RNA sequencing data

July 14, 2016

International team launches community competition to unravel how cancer changes a cell’s RNA

Logos of the Challenge partners

An open challenge that merges the efforts of the International Cancer Genome Consortium, The Cancer Genome Atlas, and the NCI Cloud Pilots with Sage Bionetworks and the open science DREAM Challenge community

Continue reading – International team launches community competition to unravel how cancer changes a cell’s RNA

March 2, 2016

New ICGC-TCGA DREAM Challenge crowd-sourced competition to help better understand how cancers evolve

Crowd sourcing illustrationOn November 16 the ICGC-TCGA DREAM Challenge launched a new crowd-sourced competition to better understand how cancer originates and evolves. It is the first project in the world to bring together crowd-sourced benchmarking and cloud-based execution of DNA sequencing analysis pipelines in an effort to improve the understanding of tumour DNA.

Continue reading – New ICGC-TCGA DREAM Challenge crowd-sourced competition to help better understand how cancers evolve

November 18, 2015

The International Cancer Genome Consortium brings more genomic health data to researchers on the Amazon Web Services Cloud

Toronto – (November 18, 2015) The International Cancer Genome Consortium (ICGC) announced today that 1,200 encrypted cancer whole genome sequences are now securely available on the Amazon Web Services (AWS) Cloud for access by cancer researchers worldwide.

The Ontario Institute for Cancer Research (OICR), which houses the ICGC’s Data Coordination Center (DCC), copied ICGC genome data onto the AWS Cloud and is providing authorized researchers with credentials to access and analyze the data using secure mechanisms. The ICGC Data Access Compliance Office has established a framework that protects the confidentiality of research participants while working to ensure that the research will benefit future cancer patients.

The newly launched initiative means one of the world’s largest collections of cancer genome data is now more easily accessible to qualified researchers, which will enhance collaboration and potentially accelerate the development of new treatments for cancer patients.

Cloud solutions have become essential to genomics research because of the vast amount of data produced by researchers and the difficulties inherent in transferring such large datasets between sites. Projects can quickly grow to several petabytes in size, with each petabyte being the equivalent of data on 223,000 DVDs. Very few institutions around the world have the capacity to download such immense datasets for analysis, and this has limited the number of researchers who can access genome projects and the scope of what can be done with the data.

With cloud computing, researchers don’t need to download data. They can work with data and run experiments in the cloud, a flexible network of servers on the Internet, and access data in minutes rather than months. Data stored in the cloud has been shown to be as secure, if not more so, than data downloaded to local servers and hard drives. The set of 1,200 genomes now available on AWS is the first installment of ICGC data to be posted and is expected to grow several fold over the next 12 months with the addition of data from more cancer patients.

“This initiative brings together one of the world’s largest cancer genome datasets and one of the world’s leading cloud computing providers to create a powerful new resource for cancer researchers,” said Dr. Lincoln Stein, Director of the Informatics and Biocomputing Program at the Ontario Institute for Cancer Research and Director of the ICGC’s Data Coordination Center. “Now, far more researchers will have access to ICGC data, opening up the possibility of new discoveries and new breakthroughs in cancer research.”

The Pan-Cancer Analysis of Whole Genomes (PCAWG) project of the ICGC and The Cancer Genome Atlas (TCGA) is coordinating analysis of more than 2,800 cancer genomes, and is making extensive use of AWS and the genomes stored on Amazon Simple Storage Service (Amazon S3). Each genome is being characterized through a suite of standardized algorithms, including alignment to the reference genome, uniform quality assessment, and the calling of multiple classes of somatic mutations. Scientists participating in the research projects of PCAWG are addressing a series of fundamental questions about cancer biology and evolution based on these data.

“Making this data available and usable will enable more researchers across the world to ask questions and get answers that were previously out of reach,” said Matt Wood, General Manager of Product Strategy at Amazon Web Services, Inc. “Researchers can now explore these large and diverse datasets in unconstrained ways, without having to manage large amounts of physical infrastructure. Instead, they can focus on driving their state-of-the-art research forward.”

“Cancer research is becoming increasingly data-heavy. Compiling the data, organizing the data, analyzing the data, making the data available to all researchers—these are fundamental to making further progress in cancer genome research, and we are excited at the possibilities of working with innovative cloud-based computing systems to achieve these advances,” said Peter Campbell, Head of Cancer Genetics and Genomics at the Wellcome Trust Sanger Institute, who is helping to lead the PCAWG project.

“In the next year, it is estimated that 14 million people worldwide will learn that they have cancer. In order to accelerate our understanding of this disease and ultimately provide better treatment, it is critical that we develop solutions able to meet the scale of this challenge. Co-localizing ICGC data as well as other cancer genomics data sets like The Cancer Genome Atlas with secure and scalable computation resources represents a major step forward for both researchers and patients. With ICGC data available on AWS, we utilized the Seven Bridges platform to perform variant calling on hundreds of genomes weeks faster than would have been possible using local infrastructure,” said Deniz Kural, CEO of Seven Bridges Genomics and Principal Investigator of one of three NCI-funded Cancer Genomics Cloud pilot projects.

“This effort to provide the ICGC datasets on AWS will lower the barriers currently associated with computing on thousands of genomes. Users will have the ability to quickly analyze datasets within the cloud on highly scalable infrastructure. This is a paradigm shift from the old model of slowly downloading data to a user’s local infrastructure before any meaningful work can commence,” said Brian O’Connor, Managing Director of Cloud Computing at the Ontario Institute for Cancer Research.

“The ICGC Data Access Compliance Office (DACO) has been a forerunner in providing controlled, secure, and efficient access to cancer genomic data to members of the research community. It now welcomes the opportunity to further advance research for the benefit of all cancer patients by enabling controlled cloud access to ICGC genomic data stored on AWS. Throughout the process, DACO will implement a robust governance framework to ensure a high degree of privacy protection to patients’ genetic and health data,” said Yann Joly, Data Access Officer, ICGC DACO, McGill University.

“This exciting collaboration and new use for cloud technology is the future of cancer research. Ontario is proud to be part of this initiative through the Ontario Institute for Cancer Research and we look forward to seeing this relationship help cancer patients around the world,” said Reza Moridi, Ontario’s Minister of Research and Innovation.

There are currently 89 ICGC projects underway at research institutes in Asia, Australia, Europe, North America, and South America. These projects seek to identify the genomic drivers of cancer and will help to lay the foundation for developing treatments tailored to patients’ individual needs. The Consortium leads worldwide efforts to map the genomes of both common and rare cancers and has the goal of identifying cancer-causing mutations in more than 25,000 tumours representing more than 50 types of cancer of clinical and societal importance across the globe.

The ICGC develops policies and quality control criteria to help harmonize the work of member projects located in different jurisdictions. Data produced by ICGC projects are made rapidly and freely available to qualified researchers around the world via the cloud and through the ICGC Data Coordination Center at (http://dcc.icgc.org).

For more information and updates about ICGC activities, please visit the website at: www.icgc.org.

November 18, 2015

The International Cancer Genome Consortium brings more genomic health data to researchers on the Amazon Web Services Cloud

Toronto – (November 18, 2015) The International Cancer Genome Consortium (ICGC) announced today that 1,200 encrypted cancer whole genome sequences are now securely available on the Amazon Web Services (AWS) Cloud for access by cancer researchers worldwide.

The Ontario Institute for Cancer Research (OICR), which houses the ICGC’s Data Coordination Center (DCC), copied ICGC genome data onto the AWS Cloud and is providing authorized researchers with credentials to access and analyze the data using secure mechanisms. The ICGC Data Access Compliance Office has established a framework that protects the confidentiality of research participants while working to ensure that the research will benefit future cancer patients.

The newly launched initiative means one of the world’s largest collections of cancer genome data is now more easily accessible to qualified researchers, which will enhance collaboration and potentially accelerate the development of new treatments for cancer patients.

Cloud solutions have become essential to genomics research because of the vast amount of data produced by researchers and the difficulties inherent in transferring such large datasets between sites. Projects can quickly grow to several petabytes in size, with each petabyte being the equivalent of data on 223,000 DVDs. Very few institutions around the world have the capacity to download such immense datasets for analysis, and this has limited the number of researchers who can access genome projects and the scope of what can be done with the data.

With cloud computing, researchers don’t need to download data. They can work with data and run experiments in the cloud, a flexible network of servers on the Internet, and access data in minutes rather than months. Data stored in the cloud has been shown to be as secure, if not more so, than data downloaded to local servers and hard drives. The set of 1,200 genomes now available on AWS is the first installment of ICGC data to be posted and is expected to grow several fold over the next 12 months with the addition of data from more cancer patients.

“This initiative brings together one of the world’s largest cancer genome datasets and one of the world’s leading cloud computing providers to create a powerful new resource for cancer researchers,” said Dr. Lincoln Stein, Director of the Informatics and Biocomputing Program at the Ontario Institute for Cancer Research and Director of the ICGC’s Data Coordination Center. “Now, far more researchers will have access to ICGC data, opening up the possibility of new discoveries and new breakthroughs in cancer research.”

The Pan-Cancer Analysis of Whole Genomes (PCAWG) project of the ICGC and The Cancer Genome Atlas (TCGA) is coordinating analysis of more than 2,800 cancer genomes, and is making extensive use of AWS and the genomes stored on Amazon Simple Storage Service (Amazon S3). Each genome is being characterized through a suite of standardized algorithms, including alignment to the reference genome, uniform quality assessment, and the calling of multiple classes of somatic mutations. Scientists participating in the research projects of PCAWG are addressing a series of fundamental questions about cancer biology and evolution based on these data.

“Making this data available and usable will enable more researchers across the world to ask questions and get answers that were previously out of reach,” said Matt Wood, General Manager of Product Strategy at Amazon Web Services, Inc. “Researchers can now explore these large and diverse datasets in unconstrained ways, without having to manage large amounts of physical infrastructure. Instead, they can focus on driving their state-of-the-art research forward.”

“Cancer research is becoming increasingly data-heavy. Compiling the data, organizing the data, analyzing the data, making the data available to all researchers—these are fundamental to making further progress in cancer genome research, and we are excited at the possibilities of working with innovative cloud-based computing systems to achieve these advances,” said Peter Campbell, Head of Cancer Genetics and Genomics at the Wellcome Trust Sanger Institute, who is helping to lead the PCAWG project.

“In the next year, it is estimated that 14 million people worldwide will learn that they have cancer. In order to accelerate our understanding of this disease and ultimately provide better treatment, it is critical that we develop solutions able to meet the scale of this challenge. Co-localizing ICGC data as well as other cancer genomics data sets like The Cancer Genome Atlas with secure and scalable computation resources represents a major step forward for both researchers and patients. With ICGC data available on AWS, we utilized the Seven Bridges platform to perform variant calling on hundreds of genomes weeks faster than would have been possible using local infrastructure,” said Deniz Kural, CEO of Seven Bridges Genomics and Principal Investigator of one of three NCI-funded Cancer Genomics Cloud pilot projects.

“This effort to provide the ICGC datasets on AWS will lower the barriers currently associated with computing on thousands of genomes. Users will have the ability to quickly analyze datasets within the cloud on highly scalable infrastructure. This is a paradigm shift from the old model of slowly downloading data to a user’s local infrastructure before any meaningful work can commence,” said Brian O’Connor, Managing Director of Cloud Computing at the Ontario Institute for Cancer Research.

“The ICGC Data Access Compliance Office (DACO) has been a forerunner in providing controlled, secure, and efficient access to cancer genomic data to members of the research community. It now welcomes the opportunity to further advance research for the benefit of all cancer patients by enabling controlled cloud access to ICGC genomic data stored on AWS. Throughout the process, DACO will implement a robust governance framework to ensure a high degree of privacy protection to patients’ genetic and health data,” said Yann Joly, Data Access Officer, ICGC DACO, McGill University.

“This exciting collaboration and new use for cloud technology is the future of cancer research. Ontario is proud to be part of this initiative through the Ontario Institute for Cancer Research and we look forward to seeing this relationship help cancer patients around the world,” said Reza Moridi, Ontario’s Minister of Research and Innovation.

There are currently 89 ICGC projects underway at research institutes in Asia, Australia, Europe, North America, and South America. These projects seek to identify the genomic drivers of cancer and will help to lay the foundation for developing treatments tailored to patients’ individual needs. The Consortium leads worldwide efforts to map the genomes of both common and rare cancers and has the goal of identifying cancer-causing mutations in more than 25,000 tumours representing more than 50 types of cancer of clinical and societal importance across the globe.

The ICGC develops policies and quality control criteria to help harmonize the work of member projects located in different jurisdictions. Data produced by ICGC projects are made rapidly and freely available to qualified researchers around the world via the cloud and through the ICGC Data Coordination Center at (http://dcc.icgc.org).

For more information and updates about ICGC activities, please visit the website at: www.icgc.org.

November 16, 2015

International Team Announces Crowdsourcing Competition, Running on Google Cloud Platform, To Understand How Cancers Originate And Evolve

An open challenge that merges the efforts of the world’s largest cancer genome sequencing consortia, the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA) with those of Sage Bionetworks and DREAM.

An international consortium of groups from Canada, the United States and the United Kingdom have come together to create an innovative, cloud-based, public challenge to optimize the discovery of genetically distinct groups of cells within cancers that could respond differently to treatment and have different risk of spreading. The ICGC-TCGA DREAM Somatic Mutation Calling Heterogeneity (SMC-Het) Challenge is the first project in the world to marry crowd-sourced benchmarking and cloud-based execution of DNA sequencing analysis pipelines to improve our understanding of tumour DNA. The Challenge launched on November 16, 2015 and will run until May 2016. Sign up to participate at: https://www.synapse.org/SMCHet.

Cancer remains a leading cause of death throughout the world because of its ability to evade even the best available therapies. Recent advances in DNA sequencing enabled patients’ tumours to be analyzed in unprecedented detail. This has revealed that tumour cells do not all share the same DNA– rather some tumour cells have evolved unique genetic characteristics that cause them to respond differently to therapy. This means that effective treatment requires understanding the many different populations of cancer cells present in each patient.

“We know that cancers are made up of many different populations of cells, known as ‘subclones’, and understanding the relationships between these subclones is critical in developing successful long term treatments.” – David Wedge, Staff Scientist at the Wellcome Trust Sanger Institute.

The Challenge tackles three key questions about the sub-clonality of cancer: how many subclones are within any given tumour, how did these subclones grow and evolve, and which genetic mutations are present in each subclones? Using a method to simulate DNA sequencing data that closely mimics data from real human tumours, which was initially developed as part of a previous DREAM challenge, the team has created a set of 50 tumours with distinctive life-histories and evolutions. Contestants will create tools in the cloud using Google Compute Engine that will be run in Galaxy, a widely-used open-source platform for performing biomedical research. Contestants will also use Docker images to setup the environment for their tool to run in, allowing the tools to easily be ported to other systems. Further, the use of Docker images and the tools’ compatibility with Galaxy ensures that all submissions are immediately usable after the Challenge, creating a new library of algorithms that researchers can use in future studies and allowing the results of these studies to be compared in an objective way.

In many scientific challenges, participants are provided the data set to do the analysis on their own systems, and send the results back for evaluation.

“In that model, we lose reproducibility. By requiring contestants to submit their methods in a portable format, the Challenge will have a truly hidden testing set to improve unbiased evaluation” said Kyle Ellrott, researcher with the Knight Cancer Institute at Oregon Health & Science University, and assistant professor at the OHSU School of Medicine Computational Biology Program. “This also means that their results will be immediately available to all members of the scientific community for large scale analysis of different data sets.”

To incentivize a high level of participation, all individuals and teams that submit a final model will be invited as consortium co-authors on an overview paper of the Challenge that will be submitted to Nature Biotechnology, as the official journal partner of the Challenge, and top performers will receive travel awards and speaking invitations at the 2016 DREAM Conference, the 2016 Sage Congress or a similar event. The overall winning algorithms for each sub-challenge will be run on a subset of the ICGC pan-cancer dataset of 2500 whole-genome sequences (subset size will depend on computational characteristics of the winning method).

“Objectively and independently assessing the quality of subclonal reconstruction algorithms is the only way that cancer researchers can make informed decisions about the tools that they use. The Challenge goes beyond informing end users about which tools to use by making available these tools in a trivial to build and run format. Only by collaborating across international borders were we able to bring together the scientific expertise and technical resources needed to make the Challenge happen.” – Amit Deshwar, PhD Candidate with Quaid Morris’s Lab in the Donnelly Centre at the University of Toronto

May 18, 2015

Research community comes together to provide new “gold standard” for genomic data analysis

TORONTO, ON (May 18, 2015) – Cancer research leaders at the Ontario Institute for Cancer Research, Oregon Health & Science University, Sage Bionetworks, the distributed DREAM (Dialog for Reverse Engineering Assessment and Methods) community and The University of California Santa Cruz published the first findings of the ICGC-TCGA-DREAM Somatic Mutation Calling (SMC) Challenge (The Challenge: https://www.synapse.org/#!Synapse:syn312572) today in the journal Nature Methods. These results provide an important new benchmark for researchers, helping to define the most accurate methods for identifying somatic mutations in cancer genomes. The results could be the first step in creating a new global standard to determine how well cancer mutations are detected.

The Challenge, which was initiated in November 2013, was an open call to the research community to address the need for accurate methods to identify cancerassociated mutations from whole-genome sequencing data. Although genomic sequencing of tumour genomes is exploding, the mutations identified in a given genome can differ by up to 50 per cent just based on how the data is analyzed.

Research teams were asked to analyze three in silico (computer simulated) tumour samples and publicly share their methods. The 248 separate analyses were contributed by teams around the world and then analyzed and compared by Challenge organizers. When combined, the analyses provide a new ensemble algorithm that outperforms any single algorithm used in genomic data analysis to date.

The authors of the paper also report a computational method, BAMSurgeon (developed by co-lead author Adam Ewing, a postdoctoral fellow in the lab of Dr. David Haussler at UC Santa Cruz), capable of producing an accurate simulation of a tumour genome. In contrast to tumour genomes from real tissue samples, the Challenge organizers had complete knowledge of all mutations within the simulated tumour genomes, allowing comprehensive assessment of the mistakes made by all submitted methods, as well as their accuracy in identifying the known mutations.

The submitted methods displayed dramatic differences in accuracy, with many achieving less than 80 per cent accuracy and some methods achieving above 90 per cent. Perhaps more surprisingly, 25 per cent of teams were able to improve their performance by at least 20 per cent just by optimizing the parameters on their existing algorithms. This suggests that differences in how existing approaches are applied are critically important – perhaps more so than the choice of the method itself.

The group also demonstrated that false positives (mutations that were predicted but didn’t actually exist) were not randomly distributed in the genome but instead they were in very specific locations, and, importantly, the errors actually closely resemble mutation patterns previously believed to represent real biological signals.

“Overall these findings demonstrated that the best way to analyze a human genome is to use a pool of multiple algorithms,” said co-lead author Kathleen Houlahan, a Junior Bioinformatician at the Ontario Institute for Cancer Research working with the Challenge lead, Dr. Paul Boutros. “There is a lot of value to be gained in working together. People around the world are already using the tools we’ve created. These are just the first findings from the Challenge, so there are many more discoveries to share with the research community as we work through the data and analyze the results.”

“Science is now a team sport. As a research community we’re all on the same team against a common opponent,” said Dr. Adam Margolin, Director of Computational Biology at Oregon Health & Science University and co-organizer of the challenge. “The only way we’ll win is to tackle the biggest, most challenging problems as a global community, and rapidly identify and build on the best innovations that arise from anywhere. All of the top innovators participated in this Challenge, and by working together for a year, I believe we’ve advanced our state of knowledge far beyond the sum of our isolated efforts.”

“Paul and the whole team have done something truly exceptional with this Challenge. By leveraging the SMC Challenge to establish a living community benchmark, the Challenge organizers have made it run more like an “infinite game” where the goal is no longer one of winning the Challenge but instead of constantly addressing an everchanging horizon,” said Dr. Stephen Friend, President of Sage Bionetworks. “And given the complex heterogeneity of cancer genomes and the rapid rate with which next generation sequencing technologies keep changing and evolving, this seems like an ideal approach to accelerate progress for the entire field.”

“We owe it to cancer patients to interpret tumour DNA information as accurately as we can. This study represents yet another great example of harnessing the power of the open, blinded competition to take a huge step forward in fulfilling that vision,” said Josh Stuart, professor of biomolecular engineering at UC Santa Cruz and a main representative of The Cancer Genome Atlas project among the authors. “We still have important work ahead of us, but accurate mutation calls will give a solid foundation to build from.”