Organism / Association file 
GENERATOR supports 8 different species for which the Gene Ontology annotations are dowloaded from GO.current.annotations.shtml. The source databases of the annotations are indicated in the brackets. 
Genelist 
GENERATOR takes in lists of gene identifiers, symbols or their synonyms that must correspond to the fields indicated in the annotation files or README files at GO.current.annotations.shtml. The identifiers must be separated with line breaks. Before performing the factorizatoin, GENERATOR removes genes from the analysis, which are not associated to the used ontology (genes without any GOterms in the current ontology). 
Ontologies 
The data for the GENERATOR clustering and analysis is obtained from the Gene Ontology database. One of the three main branches must be selected for the analysis. 
Max nb of clusters 
GENERATOR creates several partitive clustering results starting from two groups and ending to user selected number of partitions. The maximum number of partitions is indicated in this field. 
NMF iterations 
Indicates how many times the update rules in the Nonnegative Matrix Factorization (NMF) algorithm are repeated. Each additional step will make the algorithm more convergent with the local optimum. The default 100 iterations should be enough in most cases. 
NMF repeats 
GENERATOR repeats each clustering into K partitions the number of times that is indicated in this field. From these the most convergent clustering result (the one with the smallest least squares error) is selected to represent the Kth level. 
pvalue cutoff (C)  
The pvalue (C) measure indicates the overrepresentation of the GOclass in a cluster when compared to the complete genome
(Bonferroni corrected pvalue of Fisher's exact test). By default it is used for sorting classes within the result clusters. Still it is partly
dependent on the preceeding clustering, and therefore can rank high in some classes that are not overrepresented in the
whole user given gene list. Such classes are filtered by using pvalue (O) as cutoff. On the other hand, pvalue (C) may also
raise some classes that do not represent the contents of the cluster very well. These are filtered by using pvalue (S) as cutoff.
The pvalues (C) and (S) indicate the overrepresentation but are not statistically analyzable because of the dependency
on the clustering. The pvalues are calculated with Fisher's Exact test (1) from hypergeometric probability density distribution (2) with following formulaes:

pvalue cutoff (O) 
The pvalue (O) measure indicates statistical significance of the GOclass overrepresentation in the whole user given gene list when compared to the whole genome (Bonferroni corrected pvalue of Fisher's exact test; look the explanation of C.log(p) above for formulae). By default it is used in cluster description for filtering the classes that are not overrepresented in the whole gene list. pvalue (O) is statistically analyzable as it is not dependent on the clustering. 
pvalue cutoff (S) 
The pvalue (S) measure indicates the overrepresentation of the GOclass in a cluster when compared only to the user given gene list (Bonferroni corrected pvalue of a Fisher's exact test ; look the explanation of C.log(p) above for formulae). By default it is used in the cluster description for filtering the classes that are underrepresented and therefore do not represent the cluster contents very well. It is optionally suitable for sorting classes within the clusters which allows the viewing of classes that have enriched as a result of clustering. The pvalue (S) values are not statistically analyzable as they are dependent on the clustering. 
This tool was developed by Matti Kankainen (University of Helsinki) and Petri Pehkonen (University of Kuopio)
Contact the Webmaster.
© 2006 University of Helsinki and University of Kuopio