November 16, 2015
International Team Announces Crowdsourcing Competition, Running on Google Cloud Platform, To Understand How Cancers Originate And Evolve
An open challenge that merges the efforts of the world’s largest cancer genome sequencing consortia, the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA) with those of Sage Bionetworks and DREAM.
An international consortium of groups from Canada, the United States and the United Kingdom have come together to create an innovative, cloud-based, public challenge to optimize the discovery of genetically distinct groups of cells within cancers that could respond differently to treatment and have different risk of spreading. The ICGC-TCGA DREAM Somatic Mutation Calling Heterogeneity (SMC-Het) Challenge is the first project in the world to marry crowd-sourced benchmarking and cloud-based execution of DNA sequencing analysis pipelines to improve our understanding of tumour DNA. The Challenge launched on November 16, 2015 and will run until May 2016. Sign up to participate at: https://www.synapse.org/SMCHet.
Cancer remains a leading cause of death throughout the world because of its ability to evade even the best available therapies. Recent advances in DNA sequencing enabled patients’ tumours to be analyzed in unprecedented detail. This has revealed that tumour cells do not all share the same DNA– rather some tumour cells have evolved unique genetic characteristics that cause them to respond differently to therapy. This means that effective treatment requires understanding the many different populations of cancer cells present in each patient.
“We know that cancers are made up of many different populations of cells, known as ‘subclones’, and understanding the relationships between these subclones is critical in developing successful long term treatments.” – David Wedge, Staff Scientist at the Wellcome Trust Sanger Institute.
The Challenge tackles three key questions about the sub-clonality of cancer: how many subclones are within any given tumour, how did these subclones grow and evolve, and which genetic mutations are present in each subclones? Using a method to simulate DNA sequencing data that closely mimics data from real human tumours, which was initially developed as part of a previous DREAM challenge, the team has created a set of 50 tumours with distinctive life-histories and evolutions. Contestants will create tools in the cloud using Google Compute Engine that will be run in Galaxy, a widely-used open-source platform for performing biomedical research. Contestants will also use Docker images to setup the environment for their tool to run in, allowing the tools to easily be ported to other systems. Further, the use of Docker images and the tools’ compatibility with Galaxy ensures that all submissions are immediately usable after the Challenge, creating a new library of algorithms that researchers can use in future studies and allowing the results of these studies to be compared in an objective way.
In many scientific challenges, participants are provided the data set to do the analysis on their own systems, and send the results back for evaluation.
“In that model, we lose reproducibility. By requiring contestants to submit their methods in a portable format, the Challenge will have a truly hidden testing set to improve unbiased evaluation” said Kyle Ellrott, researcher with the Knight Cancer Institute at Oregon Health & Science University, and assistant professor at the OHSU School of Medicine Computational Biology Program. “This also means that their results will be immediately available to all members of the scientific community for large scale analysis of different data sets.”
To incentivize a high level of participation, all individuals and teams that submit a final model will be invited as consortium co-authors on an overview paper of the Challenge that will be submitted to Nature Biotechnology, as the official journal partner of the Challenge, and top performers will receive travel awards and speaking invitations at the 2016 DREAM Conference, the 2016 Sage Congress or a similar event. The overall winning algorithms for each sub-challenge will be run on a subset of the ICGC pan-cancer dataset of 2500 whole-genome sequences (subset size will depend on computational characteristics of the winning method).
“Objectively and independently assessing the quality of subclonal reconstruction algorithms is the only way that cancer researchers can make informed decisions about the tools that they use. The Challenge goes beyond informing end users about which tools to use by making available these tools in a trivial to build and run format. Only by collaborating across international borders were we able to bring together the scientific expertise and technical resources needed to make the Challenge happen.” – Amit Deshwar, PhD Candidate with Quaid Morris’s Lab in the Donnelly Centre at the University of Toronto