Petri Toronen, Pauli J. Ojala, Pekka Marttinen and Liisa Holm
This page represents the preprint of the manuscript representing the Gene Set Z-score. We also represent supplementary texts, supplementary figures and supplementary tables. Furthermore some additional figures, not included into the original article or to supplementary material, are shown. Matlab functions and scripts are also distributed for the testing of the represented method. An R function that calculates the GSZ score is also represented. This is for R hackers and not for end users.
One of the central tasks in biosciences is the analysis of the gene expression data using various functional gene classes (typically Gene Ontology classes, GO classes) to aid the analysis. Normally such work first generates differential gene expression scores and then computes a class level score by combining the signal of class members using a class level scoring function. We propose a novel class scoring function, Gene Set Z-score (GSZ-score), for analysis of gene classes with gene expression data. It can be considered as a gene expression weighted hypergeometric Z-score.
We compared its performance first with other popular class scoring functions. We selected standard Kolmogorov-Smirnov test (used in the original Gene Set Enrichment Analysis), modified Kolmogorov-Smirnov test (used in the current Gene Set Enrichment Analysis), hypergeometric test calculated at every threshold position (similarly to iterative Group Analysis) and to modified t-test calculated between the class members and class non-members. All other parts of the analysis were kept exactly identical in these comparisons. Our evaluations include:
Our scoring function outperformed other functions in these comparisons.
We also compared different actual program packages (Gene Set Analysis, Gene Set Enrichment Analysis, Signal Pathway) with our analysis pipeline using two real datasets. We monitored again the biologically relevant classes. We kept the different variables between the program packages identical. These included the number of permutations, the analyzed set of GO classes, minimum and maximum size for the allowed GO class.
Our analysis pipeline surpassed others also in this comparison.
Here is the preprint manuscript version (tables and figures embedded). Here's the link to article at the BMC web site .
Here is a PDF of the poster presented in the ISMB 2009. This summarizes some of the findings from the manuscript.
Supplementary figures are described in the manuscript, at the very end.These are PDF files
Supplementary tables are also described at the very end of the manuscript. These are excel files.
This is material not included to submission. These details might be still informative, as they show an omitted analysis on the effect of various prior variance weights on the stability of the Z score across different threshold positions and across different class sizes. Analysis shows only row randomizations
These figures are encapsulated postscript files.
Here's a link to a folder that includes matlab code and data matlab code
Here's a link to *.tar.gz . This package excludes the demodatasets 1 and 2
Code is mostly documented. Demo data currently available. Check the README for details... Comments on code are very wellcome
Here is the central function coded in R. No wrapper function included. Therefore this is for R programmers and not for end users.
Demo data in R here. No explanations yet but this should be self-explanatory.