Protein sequence analysis tools
adda logo
ADDA, an automatic algorithm for domain decomposition and clustering of all protein domain families. The clusters unify remote homologues with many functions.
A Heger, L Holm (2003) Exhaustive enumeration of protein domain families. J Mol Biol 328(3): 749-67.
gtg logo
The Global Trace Graph (GTG) is a pruned, all-against-all alignment graph of a non-redundant representative set of all known protein sequences. Alignments and scores are calculated using a variation of the Maxflow algorithm for transitive alignment. The GTG-Deep algorithm implements a deeper search for very distant homologues to an input protein, using only sequence information. The new GTG Server (follow link above) uses this algorithm to provide a thorough search of protein-space to find homologues to your query protein, to identify your protein's fold(s), to find structural and functional motifs, and to align your query to known PDB structures where possible.
(in submission)
Many large proteins have evolved by internal duplication, and many internal sequence repeats correspond to functional and structural units. RADAR (Rapid Automatic Detection and Alignment of Repeats in protein sequences) uses an algorithm for segmenting a query sequence into repeats, identifying short composition-biased as well as gapped approximate repeats, plus complex repeat architectures involving many different repeat types.
A Heger, L Holm (2000) Rapid automatic detection and alignment of repeats in protein sequences.
Proteins 41(2): 224-237.
RSDB (Representative Sequence DataBase) is a non-redundant sequence database, which uses a fast and complete lookup algorithm for the removal of fragments and close similarities. RSDB is used in part by the ADDA Database.
L Holm, C Sander (1998) Removing near-neighbour redundancy from large protein sequence collections.
Bioinformatics 14: 423-429.
Protein structure analysis tools
dali logo
The Dali server is a network service for comparing protein structures in 3D. You submit the coordinates of a query protein structure and Dali compares them against those in the Protein Data Bank. A multiple alignment of structural neighbours is emailed back to you. In favourable cases, comparing 3D structures may reveal biologically interesting similarities that are not detectable by comparing sequences. If you want to know the structural neighbours of a protein already in the Protein Data Bank (PDB), you can find them in the Dali Database. The Dali Domain Dictionary is a numerical taxonomy of all known structures in the PDB - you can view the entire Dali classification by following the link to the Dali Domain Dictionary.
L Holm, C Sander (1993) Protein structure comparison by alignment of distance matrices. J Mol Biol 233(1): 123-38.
S Dietmann, J Park, C Notredame, A Heger, M Lappe, L Holm (2001) A fully automatic evolutionary classification of protein folds: dali domain dictionary version 3. Nucleic Acids Res 29(1): 55-7.
MaxSprout is a fast database algorithm for generating protein backbone and side chain coordinates from a C-alpha trace. The backbone is assembled from fragments taken from known structures. Side chain conformations are optimised in rotamer space using a rough potential energy function to avoid clashes. The MaxSprout Server now resides at the EBI; follow the above link.
L Holm, C Sander (1991) Database algorithm for generating protein backbone and side-chain co-ordinates from a C alpha trace, application to model building and detection of co-ordinate errors. J Mol Biol 218(1): 183-194.
PUU Database
A database of structural domains in 396 representative structures from the PDB_select.aug_1994 set (over 30% sequence identity). The database was produced automatically by the program PUU.
L Holm, C Sander (1994) Parser for protein folding units. Proteins 19(3): 256-268.
SolvX is a program that evaluates the atomic solvation preference of full-atom 3D protein models. Solvation preference is a measure of solvent accessibility for each residue within a protein; well-packed structures should have an overall solvation preference value less than zero. This program is particularly useful when evaluating the quality of a theoretical 3D model of a protein compared with experimentally resolved structures.
Holm L, Sander C (1992) Evaluation of protein models by atomic solvation preference. J Mol Biol 225(1):93-105.
Gene expression analysis tools
poxo logo
POXO is a series of tools that can be used to discover, search and verify possible regulatory cis-element(s) from set(s) co-expressed genes.
A typical computational pipeline for discovering regulatory elements, which might be responsible of the co-regulation of the gene set, begins by gathering the genes under interest. For example, a set of genes co-expressed in a microarray experiment. The second step then involves the finding of the functions of the genes. That allows you to judge where the gene set you chose was a good or bad. If the genes are involved in several different functions, it might be a good idea to sub-group them into functionally coherent subsets. The reason for this is that genes that share a common theme are more often co-regulated than genes with diverge functions. So, after you have selected your genes, the next step is to retrieve their upstream sequences, or alternatively some other part of the sequence, and to perform the transcription factor binding site analysis. The final step is then to understand the results and to verify the patterns discovered.
Kankainen M, Pehkonen P, Rosenstöm P, Törönen P, Wong G and Holm L. (2006) POXO: a web-enabled tool series to discover transcription factor binding sites. Nucleic Acids Res, 34(Web Server issue), W534-W540.