POXO is a series of tools that can be used to discover, search and verify possible regulatory cis-element(s) from set(s) co-expressed genes.
A typical computational pipeline for discovering regulatory elements, which might be responsible of the co-regulation of the gene set, begins by gathering the genes under interest. For example, a set of genes co-expressed in a microarray experiment. The second step then involves the finding of the functions of the genes. That allows you to judge where the gene set you chose was a good or bad. If the genes are involved in several different functions, it might be a good idea to sub-group them into functionally coherent subsets. The reason for this is that genes that share a common theme are more often co-regulated than genes with diverge functions. So, after you have selected your genes, the next step is to retrieve their upstream sequences, or alternatively some other part of the sequence, and to perform the transcription factor binding site analysis. The final step is then to understand the results and to verify the patterns discovered.
Kankainen M, Pehkonen P, Rosenstöm P, Törönen P, Wong G and Holm L. (2006) POXO: a web-enabled tool series to discover transcription factor binding sites. Nucleic Acids Res, 34(Web Server issue), W534-W540.

Generator is a tool to evaluate and group incoherently annotated genes into subsets according to their gene ontology (GO) terms.
After the set of co-expressed genes has been gathered, the functions of the genes can be examined. The functional examination enables the evaluation of the goodness of the gene set, because co-expressed and functionally coherent genes are more likely co-regulated than a set of genes only co-expressed. If it seems that the genes are involved in divergent functions, it is sometimes a good idea to analyze the genes in functional categories. Functional grouping can especially be an important step, when the original gene set has been derived using arbitrary cutoffs, i.e. by selecting a set of genes that are more than two fold up or down expressed).
Pehkonen P, Wong G and Toronen P. (2005) Theme discovery from gene lists for identification and viewing of multiple functional groups. BMC Bioinformatics, 6.

Sequence retriever is a tool to retrieve the upstream sequences of a gene set. Currently the tool can be used to retrieve sequences for A. gambiae, A. thaliana, C. elegans, D. melanogaster, H. sapiens, M. musculus.
After one has selected the genes that he/she will analyze, their upstream sequences must be retrieved.

POCO is tool to find over-represented, under-represented and distinctly represented regulatory patterns from either one or two sequence sets.
After investigating the functions of a set of genes and after retrieving their upstream sequences, one can analyse the sequences in order to discover potential regulatory elements. These analyses enable to discover patterns that can be responsible of the regulation of the gene set. In the regulatory pattern analysis, statistically significant nucleotide patterns are being searched for, i.e. patterns that occur surprisingly often within the sequences. In addition to other tools, the advantage of POCO is its ability to discover patterns that are over-represented in one and under-represented in another sequence set. The comparison of two gene sets can be useful, for example when up and down-regulated gene sets needs to be compared.
Kankainen M and Holm L. (2005) POCO: discovery of regulatory patterns from promoters of oppositely expressed gene sets. Nucleic Acids Res, 33(Web Server issue), W427-431.

POCO 2nd
POCO 2nd is a tool to discover over-represented, under-represented and distinctly represented potential cis-elements from either one or two sequence sets that are located in the vicinity of a pre-defined cis-element.
After finding the potential regulatory elements, it can be fruitful to search for other over-represented nucleotide patterns in their near vicinity. This enables e.g. to extend the length of the search pattern and to discover patterns consisted of two parts that are connected with a flexible and invariable linker part (for example, nuclear receptor binding sites that contain from three to six nucleotides in their middle.)

Pattern clustering is a tool to cluster a set of DNA patterns onto smaller and more representative set of DNA patterns.
Most pattern enumeration tools, such as POCO, report patterns overlapping. Therefore after discovering a set of statistically significant nucleotide patterns, it is useful to cluster the overlapping patterns into representative patterns.

POBO is a tool to summarize, verify and screen predetermined cis-elements from a set of sequences. POBO reports the results in as understandable format as possible for biologists.
After finding the regulatory elements, it can be helpful to summarize the statistics into as readable format as possible or to do more intensive analyses. POBO enables rapid evaluation of pre-defined regulatory elements and can therefore be used to verify hypothesis.
Kankainen M and Holm L. (2004) POBO, transcription factor binding site verification with bootstrapping. Nucleic Acids Res, 32(Web Server issue), W222-W229.

Visualize is a tool to visualize the locations of regulatory patterns within the sequences.
After finding regulatory patterns, a common approach for their evaluation is to map their locations within the given sequences. The assumption in here is that the functional patterns should be located approximately in equal positions in different genes, or as far from ATG/TSS site.

Tracker is a tool for evolutionary footprinting. It can be used to visualize potential cis-elements within the analyzed sequences and within their homologous sequences in other organisms.
After finding the regulatory patterns, another approach for their evaluation is to look for their locations in the corresponding homologous sequences. The assumption in here is that elements with a function are conserved between species; in terms of bioinformatics this means that the elements are located in well aligned sequence regions.
In Tracker, the information of homologous gene and their upstream sequences is retrieved mostly from Ensembl. The query sequences (user's input sequences) can be searched using Ensembl IDs or using blast searches against a collection of upstream sequences of Ensemble genes. If a satisfactory hit is found, the homologous genes are selected using the information of the found Ensemble gene and by aligning the upstream sequences using LAGAN. (The data for Arabidopsis is retrieved from TAIR, and cannot therefore be queried as the other organisms, from Ensemble. We will update and re-format the A. thaliana data soon to make it similar to the rest of the data. In this update, we will also include more plant species in our system.)

Pattern screener is a tool to associate the patterns found to known elements listed in cis-element collection.
After finding the patterns with a potential regulatory role, patterns can be evaluated by screening for resembling cis-elements from the known cis-element collections. In pattern screener, the found patterns can be screened against PLACE, JASPAR or TRANSFAC public. (PLACE database).

Matlign (Matrix alignment) is a tool to align and combine a set of nucleotide matrices and/or patterns onto a smaller and more representative set of nucleotide matrices and/or patterns. The tool was originally developed for the analyses of transcription factor binding site matrices/patterns, and was therfore designed to create only a certain maximum number of gaps on the alignment. This feature (for example, this server is limited to maximally 1 gap) allows alignments that accurately reflect the true biology of the binding sites (compared to the prevailing practices that use either unlimited number of gaps or no gaps at all). For example, nuclear receptors bind onto a element that consists two short, repeated DNA sequences (either AGAACA or AGGTCA) separated by a variable length spacer (from 3 to 6 bp). To reliably combine two nuclear receptors matrices/patterns together, one therefore should allow maximally one gap (the variable length spacer) in the alignment.
Kankainen M and Löytynoja A. (2003) MATLIGN: a motif clustering, comparison and matching tool BMC Bioinformatics, 8, 189, 1471-2105.

Dancer is a stand-alone tool that runs in Windows operating system. Dancer can be used to reconstruct in-situ hybridization pictures from gene expression data. The program e.g allows visualizing the expression of a gene in anatomical format.
Kankainen M and Wong G. (2003) DANCER: a program for digital anatomical reconstruction of gene expression data. Nucleic Acids Res, 31, e132.

Some Warnings
All programs have been tested using Internet Explorer (6.0) with no apparent problems. Opera browsers are also compatible. With Netscape 7.X series some difficulties has been encountered.
Some tools use and create Scalable Vector Graphics (SVG) images. To view these you must have svg-grapichs plugin installed in your browsers. If you don't have svg-pluging you can download it freely from Adobe home pages
Results, if created, are stored on the server at random time and may be removed without notice to the user.

This tool was developed by Matti Kankainen, University of Helsinki
Contact the Webmaster.
© 2006 University of Helsinki