Protein sequence analysis tools
ADDA, an automatic algorithm for domain decomposition and clustering of all protein domain families.
The clusters unify remote homologues with many functions.
A Heger, L Holm (2003) Exhaustive enumeration of protein domain families. J Mol Biol 328(3): 749-67.
The Global Trace Graph (GTG) is a pruned, all-against-all alignment graph of a non-redundant representative set
of all known protein sequences. Alignments and scores are calculated using a variation of the
algorithm for transitive alignment. The GTG-Deep algorithm implements a deeper search for very distant homologues
to an input protein, using only sequence information. The new GTG Server (follow link above) uses this algorithm
to provide a thorough search of protein-space to find homologues to your query protein, to identify your protein's
fold(s), to find structural and functional motifs, and to align your query to known PDB structures where possible.
Many large proteins have evolved by internal duplication, and many internal sequence repeats correspond to functional and
structural units. RADAR (Rapid Automatic Detection and Alignment of Repeats in protein sequences) uses an algorithm for
segmenting a query sequence into repeats, identifying short composition-biased as well as gapped approximate repeats, plus
complex repeat architectures involving many different repeat types.
A Heger, L Holm (2000) Rapid automatic detection and alignment of repeats in protein sequences.
Proteins 41(2): 224-237.
RSDB (Representative Sequence DataBase) is a non-redundant sequence database, which uses a fast
and complete lookup algorithm for the removal of fragments and close similarities. RSDB is used in
part by the ADDA Database.
L Holm, C Sander (1998) Removing near-neighbour redundancy from large protein sequence collections.
Bioinformatics 14: 423-429.
Protein structure analysis tools
The Dali server is a network service for comparing protein structures in 3D. You submit the coordinates of a query
protein structure and Dali compares them against those in the Protein Data Bank. A multiple alignment of structural
neighbours is emailed back to you. In favourable cases, comparing 3D structures may reveal biologically interesting
similarities that are not detectable by comparing sequences. If you want to know the structural neighbours of a
protein already in the Protein Data Bank (PDB), you can find them in the Dali Database. The Dali Domain Dictionary
is a numerical taxonomy of all known structures in the PDB - you can view the entire Dali classification by
following the link to the Dali Domain Dictionary.
L Holm, C Sander (1993) Protein structure comparison by alignment of distance matrices. J Mol Biol 233(1): 123-38.
S Dietmann, J Park, C Notredame, A Heger, M Lappe, L Holm (2001) A fully automatic evolutionary classification of protein folds: dali domain dictionary version 3. Nucleic Acids Res 29(1): 55-7.
MaxSprout is a fast database algorithm for generating protein backbone and side chain coordinates from a
C-alpha trace. The backbone is assembled from fragments taken from known structures. Side chain conformations
are optimised in rotamer space using a rough potential energy function to avoid clashes.
The MaxSprout Server now resides at the EBI; follow the above link.
L Holm, C Sander (1991) Database algorithm for generating protein backbone and side-chain co-ordinates from a C alpha trace, application to model building and detection of co-ordinate errors. J Mol Biol 218(1): 183-194.
A database of structural domains in 396 representative structures from the PDB_select.aug_1994 set
(over 30% sequence identity). The database was produced automatically by the program PUU.
L Holm, C Sander (1994) Parser for protein folding units. Proteins 19(3): 256-268.
SolvX is a program that evaluates the atomic solvation preference of full-atom 3D protein models.
Solvation preference is a measure of solvent accessibility for each residue within a protein;
well-packed structures should have an overall solvation preference value less than zero. This program
is particularly useful when evaluating the quality of a theoretical 3D model of a protein compared
with experimentally resolved structures.
Holm L, Sander C (1992) Evaluation of protein models by atomic solvation preference. J Mol Biol 225(1):93-105.
Gene expression analysis tools
POXO is a series of tools that can be used to discover, search and verify possible regulatory cis-element(s) from set(s) co-expressed genes.
A typical computational pipeline for discovering regulatory elements, which might be responsible of the co-regulation of the gene set, begins
by gathering the genes under interest. For example, a set of genes co-expressed in a microarray experiment. The second step then involves the
finding of the functions of the genes. That allows you to judge where the gene set you chose was a good or bad. If the genes are involved in
several different functions, it might be a good idea to sub-group them into functionally coherent subsets. The reason for this is that genes
that share a common theme are more often co-regulated than genes with diverge functions. So, after you have selected your genes, the next
step is to retrieve their upstream sequences, or alternatively some other part of the sequence, and to perform the transcription factor
binding site analysis. The final step is then to understand the results and to verify the patterns discovered.
Kankainen M, Pehkonen P, Rosenstöm P, Törönen P, Wong G and Holm L. (2006) POXO: a web-enabled tool series to discover transcription factor
binding sites. Nucleic Acids Res, 34(Web Server issue), W534-W540.