September 15, 2016

Canadian government makes big investment in big data research

OICR's server room

On September 13 the Government of Canada, through Genome Canada, made a $4 million investment in Canadian big data research to help improve real world challenges such as infectious disease outbreaks, managing food crops and combating cancer.

Of the 16 projects funded across Canada, three are based at OICR. Led by OICR Principal Investigators Drs. Paul Boutros, Vincent Ferretti, Jared Simpson and Lincoln Stein (Stein is also OICR’s Interim Scientific Director and leader of the Institute’s Informatics and Biocomputing Program), the projects are developing ways to make genomics and health data more manageable, securely accessible and easily understood. Together these projects will help to facilitate cancer research and assist in the adoption of more precision medicine. As well, they have applications in other fields of genomics research beyond cancer, such as agriculture and energy.

“The visualization of big data is an enormous and growing challenge,” said Boutros. “We are so excited Genome Canada has supported a focused effort to develop new methods and tools that will be useful across all big data projects, in cancer biology and beyond.”

“This funding will allow my lab to develop new software tools that will make genome sequencing technology easier to use, making sequencing accessible to more scientists,” said Simpson.

Each of the OICR projects received $250,000 in funding. The funded projects are:


Enhanced and Automated Visualization of Complex Data
Project leader: Dr. Paul C. Boutros, Ontario Institute for Cancer Research

Modern genomics research generates massive amounts of data. But these data sets are too big and complex to be useful on their own. Researchers must first analyze and interpret biological data to better understand them and turn them into meaningful information. This information can then be used to help solve real-world problems, such as developing new tools or strategies to better diagnose and treat patients, increase crop yields or monitor the environment. Increasingly, the ability of the human end-user to interpret the data is the key factor limiting researchers from delivering these much-needed solutions more quickly.

Dr. Paul C. Boutros of the Ontario Institute for Cancer Research is leading a team developing ways of making “big data” results more easily understood by improving the way it is visualized and interpreted. The team will create interactive visualization tools that will integrate tightly with databases scientists already use routinely. The team will use crowdsourcing to capture the best visualization ideas from a broad community of scientists, graphic designers and citizen-scientists. The project will build on the human brain’s ability to interpret images, to make the conclusions of biological data more readily accessible and accelerate the rate of biological discovery and innovation.


Dockstore: A platform for sharing cloud-agnostic tools with the research community
Project leaders: Drs. Vincent Ferretti, Lincoln Stein, Ontario Institute for Cancer Research

An unintended consequence of the development of genomics has been the proliferation of massive datasets, making analysis increasingly difficult. A further problem is the lack of standardization in how analysis tools are packaged, described and executed across computer environments. Drs. Vincent Ferretti and Lincoln Stein of the Ontario Institute for Cancer Research, in collaboration with Dr. Brian O’Connor of the University of California, Santa Cruz, have developed a web application called the Dockstore, which addresses the challenge of encapsulating and sharing bioinformatics tools so that they can be moved from environment to environment.

Now the researchers are adding key features to the Dockstore to continue to enhance and evolve the platform. They will also integrate bioinformatics tools and workflows from the Global Alliance for Genomics and Health (GA4GH) for redistribution to the larger research community and will work with collaborators to facilitate the registration of their high-quality tools into the Dockstore. Finally, the researchers will work with other projects to enable sharing of tools across genomic repositories. These activities will drive increased usage of the Dockstore, thereby increasing tool sharing among scientists in fields as diverse as agriculture, energy and human health.


Rapid, accessible genome assembly using long read sequencing
Project leader: Dr. Jared Simpson, University of Toronto and Ontario Institute for Cancer Research

DNA sequencing technology has progressed from sequencing single reference genomes at great cost and time, to the current era of inexpensive, high-throughput short read sequencing. The emerging “third generation” of DNA sequencing technology offers the prospect of putting long read genome sequencing in the hands of more researchers and enabling new applications, through portable instruments that will decentralize sequencing technology.

Dr. Jared Simpson of the University of Toronto is developing robust and efficient genome assembly software that is easy to use, to match the capabilities of these emerging sequencing instruments. The software will target biologists and other end users of sequencing who don’t have substantial bioinformatics expertise.

For a full list of funded projects, visit Genome Canada: http://www.genomecanada.ca/sites/genomecanada/files/2015_bcb-backgrounder-en.pdf