Blood samples, biostatistics and a fresh perspective: The makings of a cancer prediction machine

Biostatistics Training Initiative (BTI) alumnus brings on new BTI trainee to study Canada’s largest population health dataset using today’s top technologies

Recently, circulating tumour DNA (ctDNA) – DNA released from cancer cells that freely circulates in the blood – has garnered much attention not only as an alternative to traditional tissue biopsies, but as a potential blood-based biomarker for early cancer diagnosis.

The ability to detect the earliest blood-borne traces of cancer largely rests in our ability to determine which molecular markers indicate that a cancer is developing – or which patterns in ctDNA can predict whether a cancer will grow. Dr. David Soave sees this as a mathematical challenge that, if solved, could have huge impact for better predicting and diagnosing a wide variety of cancers.

“To find cancer earlier or predict who will develop the disease, we need to carefully compare human samples from those who will develop cancer and samples from those who won’t,” Soave, an Assistant Professor at Wilfrid Laurier University and OICR Associate, says. “This type of challenge requires new statistical models, methods and computational techniques that can decipher large, complex and high-dimensional data.”

Last year, the Canadian Partnership for Tomorrow Project (CPTP) unified the data from several provincial longitudinal health studies into a national cohort consisting of more than 325,000 participants who are voluntarily donating their health and biologic samples to research. As some CPTP participants will develop disease and others will not, this dataset provides an unprecedented resource for researchers like Soave to discover the earliest traces of cancer that appear several months to years prior to an initial diagnosis.

Soave, for example, has been involved with projects that have leveraged longitudinal health data to find early indicators of leukemia up to 10 years before symptoms surface and compare the environmental and genetic effects on respiratory health.

“Collecting and coordinating these data was an ambitious feat and it’s our responsibility now to turn these data into discoveries,” says Soave. “I believe that the next generation of statisticians will have a large part in realizing the potential of this dataset as it continues to richen.”

University of Waterloo Master of Mathematics student Jordyn Walton is part of the next generation.

“In class, we learn quite a bit about machine learning tools and the mathematical foundations behind these techniques,” Walton says. “Machine learning is an exciting topic and it is well-suited for high-dimensional data challenges, which we often encounter when studying cancer.”

Walton recently joined Soave’s research group through the Biostatistics Training Initiative (BTI), a training program co-created by OICR and the University of Waterloo. Over the next six months, she will be developing methods to investigate somatic mutations and methylation changes as early indicators of cancer.

“Working with one of the largest biologic datasets in Canada brings new challenges to the table,” says Walton. “I’m excited to be applying and refining the latest machine learning techniques to help advance liquid biopsies and, ultimately, enable earlier cancer diagnoses.”

Walton is one of five 2019 BTI interns. Her internship is co-supervised by Soave, who is a former BTI Fellow, and Dr. Philip Awadalla, Director of Computational Biology at OICR and National Scientific Director of CPTP.

To learn more about the BTI Program, please visit the OICR BTI Program page, the University of Waterloo’s BTI webpage or read more about some of the past studentships.

Adaptive Oncology Biostatistics Biostatistics Training Initiative Canadian Partnership for Tomorrow Project Computational Biology and Genome Informatics David Soave Jordyn Walton Ontario Health Study Philip Awadalla University of Waterloo

Join our Mailing List