October 18, 2016

Reactome releases 10,000th annotated human protein, a major milestone that will benefit research community

Reactome - Graphic announcing the 10,000th human protein annotated

Open source tools like Wikipedia and Google Maps help us get things done faster in our daily lives. In the same way, researchers rely on a variety of open source tools to help them make discoveries faster. Reactome (www.reactome.org) is one such tool. Researchers use it because it relates human genes, proteins and other biomolecules to the biological pathways and processes in which they participate, helping to facilitate new cancer research breakthroughs. Earlier this month Reactome reached a major milestone when it released its 10,000th annotated human protein to the research community. We spoke to OICR’s Dr. Robin Haw, who is Project Manager and Outreach Coordinator at Reactome, about the history of the project, the importance of this particular milestone and where the project is headed next.

Why was Reactome started? What was the inspiration behind the project?
One of the critical challenges for researchers in the years since the first genomes were sequenced has been to link genome sequence information with the existing knowledge of biological systems represented in the literature. Fifteen years ago, there were several available resources (e.g., RefSeq, LocusLink, SwissProt, Proteome, etc.), which focused on providing biological information for genes and proteins. The Gene Ontology group was focused on defining standard nomenclature for biological processes for fruit fly, mouse and brewer’s yeast. At this time there was no logical representation of larger biological processes, pathways, or protein complexes that also included direct associations to the sequence data. With all this in mind, back in early 2001, Lincoln Stein (now Director of Informatics and Bio-computing and Interim Scientific Director at OICR), Ewan Birney and other bioinformaticians, computer scientists, and molecular biologists participated in a workshop on gene annotation at Cold Spring Harbor Laboratory in New York. The outcome of this workshop was the proposal to develop a bioinformatics resource called the Genome KnowledgeBase (GKB), which would later become Reactome. 

Reactome now covers half of the
protein-coding portion of the genome

What is your role on the project?
I’m the project manager and outreach coordinator. I work with the other members of the team at OICR (Marija Orlic-Milacic, Karen Rothfels, Joel Weiser and Solomon Shorser) and with the group members at New York University School of Medicine in the USA, and European Bioinformatics Institute in the U.K. 

What is OICR’s role in Reactome?
OICR is the lead institution for Reactome, with Lincoln Stein as the lead Principal Investigator. Since its inception, Reactome has made many collaborative connections with the genomics community and the informatics standards communities worldwide. These connections have been critical to establishing and maintaining collaborations with potential users and contributors to Reactome. We believe that Reactome and pathway-based analysis will be critical in the coming years for interpreting the results of not only cancer genome resequencing, but also other genomic biomedicine initiatives. Having OICR lead the Reactome project has greatly facilitated this focus.

How has Reactome evolved from when it first started to this release, where you’re at 10,000 human proteins?
When Reactome was first conceived, it was to be used by general biologists as an online textbook of biology, or by bioinformaticians to make discoveries about their experimental datasets. Now, Reactome represents one of a very small number of open access curated biological pathway databases. Its authoritative and detailed content has directly and indirectly supported basic and translational research studies, helping to discover patterns in high-throughput data. Furthermore, Reactome has enabled scientists, clinicians, researchers, students and educators to find, organize and utilize biological information to support data visualization, integration and analysis. It is a resource for the whole research community that is helping researchers worldwide answer important questions about cancer. In doing so, they are able to drive cancer research forward.

The more gaps we’re able to fill in that map, the more useful Reactome will be for the research community and the more we can do to accelerate the pace of cancer research discovery.

 Why is the number 10,000 significant?
Given that the human genome contains roughly 20,000 protein-coding genes in total, the annotation of the 10,000th protein means that Reactome now covers half of the protein-coding portion of the genome. This makes Reactome the most comprehensive open access pathway knowledge base available to the scientific community.

What are the next steps for Reactome? Is there a timeline to reach 20,000 proteins?
Hitting 20,000 unique annotated proteins is admittedly going to be a big challenge for Reactome. It will take a few more years, hopefully less than a decade! Our curation process relies heavily on transforming the information, within the published literature, about interactions between biomolecules into knowledge about a cellular process. We don’t yet know if there is enough published material that describes the associations between known proteins and molecular reactions and pathways. Instead, we plan to curate and incorporate into pathways, new molecular entities, clinically significant protein sequence variants and isoforms, drug-like molecules, and the complexes these entities form. Another goal will be to fill the gaps in our pathway annotations to create a more connected reaction map. The more gaps we’re able to fill in that map, the more useful Reactome will be for the research community and the more we can do to accelerate the pace of cancer research discovery.