Suplementary Material for manuscript submitted to BioEssays
From Genome to Phenome:
A back to the future of gene expression analysis
Antonio Reverter,
Wes Barris, Sean McWilliam, Greg Harper and Brian Dalrymple
Bioinformatics Group, CSIRO Livestock Industries
306 Carmody Rd., St. Lucia, QLD 4067, Australia
- Abstract.
- Table:
List of tissues, number of SAGE libraries and genes used in the study (MS Word).
- Directory of Extreme Genes.
Directory with extreme (ie. tissue-specific) genes for each of the 41 tissues
included in the analyses. Each file is an ASCII text file with varying number
of lines from 0 to 239. Each file contains 4 fields: 1. Gene ID; 2. Average
expression across all tissues (in Log of transcripts for 200,000); 3. STD of
expression across all tissues; and 4. Average expression in the tissue in question.
NB: These genes (1,423 in total) were removed for subsequent analyses.
- Table: REML Solutions
Results from the restricted maximum likelihood estimation analysis of Model [1]
performed using the VCE software
(ASCII text file. 104 Lines).
- Table:
List of the 16,348 genes included in the analyses and sorted by differential
expression (DE) from most underexpressed to most overexpressed in cancer. For each
condition (cancer and normal), four values are provided: number of tissues (T),
number of libraries (L), average tags per 200,000 (tp2), and t-statistic as
computed from the BLUP difference between cancer and normal in the gene by
condition random interaction (Microsoft Excel file).
- Figure: Density and Clusters
Empirical density for measures of differential expression and the
posterior probability of each value belonging to each cluster. The density
has no ordinate scale, but the total area under the curve corresponds to
probability 1 and individual densities are drawn proportionally. The lines
for clusters 1, 2 and 3 represented posterior probabilities for three classes
of DE genes. Cluster 1, extreme DE genes; cluster 2, intermediate DE genes;
cluster 3, genes with no differential expression when compared between the
cancerous and the normal state. (PNG file).
- Table:
Gene expression correlations among the 41 tissues (Microsoft Excel file).
- Figure:
Heat map of the tissue to tissue correlation matrix. Thick lines separate
cancerous from normal tissues. The spectrum goes from blue (correlation <=
-0.45) to white (-0.05 < correlation <= 0.05) to red (correlation > 0.45).
- Modules Figure:
Modules of Co-Expression: Of the top 100 genes, 73 had functional annotation in the
Cancer Module Map
(Segal et al. 2004; Nat. Genet. 36, 1090-1098) in a total of 210 modules,
87 of which contained at least 3 genes. This figure depicts the circular
analysis of the 87 modules with their number of genes indicating the radius of
the circles and the average expression (in terms of up- or down-regulation in
cancer tissue) as the up- or down- distance to the horizontal zero axis.
The number inside the circles indicate the Module Number and the thickness of
the lines connecting the circles indicates the number of genes in common.
- Modules Table:
Modules of Co-Expression: List of the 87 modules from the Cancer Module Map
represented among our differentially expressed (DE) genes. The list includes the
number of DE genes, the rank of the module in terms of up- or down- regulation
due to cancer, and the parent and children modules (Microsoft Excel file).
- Gene to Gene Correlation Matrix:
Correlation matrix for a subset of the top 100 cancer genes. Thick lines indicate
blocks: A for extracellular matrix; B for nucleus and cell progression; C for actin
cytoskeleton; D for fatty acid metabolism and E for glutamine/glutathioine/oxidative.
stress (png file).
- Visible Human Distance Table:
Table of distances across organs obtained from exploring the relevant anatomic images
from The Visible Human Project.
(Word document).
- Visible Human Distance Figure:
Comparison Visual vs Normal vs Cancer: Location of two-dimensional
coordinates for 12 organs (brain, retina, spinal, heart, thyroid, prostate, liver
lung, stomach, kidney, pancreas and colon).
- Permutation Source Code:
Fortran 90 code to perform permutation test to the coordinates resulting from
the distances across organs obtained from exploring the relevant anatomic images
from The Visible Human Project.
Note: The code contain the 3-dimensional coordinates for each organ resulting from
multidimensional scaling (ASCII text file. 244 Lines).
- A Human Through Alien Eyes:
Drawing by Ana Fonollosa Gonzalez of a Human being as seen by an Alien (JPG Picture).
NOTE: The following files are BIG
- Data File:
Original (entire) data file containing 1,782,189 records (rows) and 5 columns as
follows: 1. Gene ID; 2. Tissue; 3. SAGE library; 4. Size of library in transcripts;
and 5. Gene expression in transcripts per 200 thousand
(gzip ASCII text compressed file, 9.7Mb!).
- Mixtures Table:
Results from mixtures of distributions analyses (model-based clustering)
performed using the
EMMIX software to identify DE genes between cancer and normal tissue
(ASCII text file. 55,991 Lines!).