Cancer has enormous complexity at the molecular level, with multiple genes, proteins and pathways and regulatory interconnections being affected. Motivated by this complexity, we propose a systems biology approach that formally integrates the available genetic, transcriptomic, epigenetic and molecular knowledge on cancer biology. After classifying genes into cancer-associated and non-cancer-associated, we compile a set of functional attributes highly relevant to cancer biology including protein kinases, secreted proteins, transcription factors, and tissue specificity. Cancer-associated genes are used to extract "common cancer fingerprints" through these molecular attributes, and a Boolean logic is implemented allowing the rational integration of expression data and functional attributes. This Boolean logic gives rise to a guilt-by-association classifier that generates an inventory of novel cancer-associated genes. Finally, novel cancer-associated genes are interlaced with the known cancer-related genes in a weighted network circuitry aimed at identifying highly conserved gene interactions that impact cancer outcome. We demonstrate the effectiveness of this approach using colorectal cancer as a prototype and identify several novel candidate genes classified according to their functional attributes. We argue that this is a holistic approach that faithfully mimics cancer characteristics, efficiently predicts novel candidates and has universal applicability to complex diseases.

The network file used to generate The Always Conserved network (Figure 3) is available here for download and can be opened using cytoscape.