Whole-genome
shotgun sequencing has propelled the re-evaluation of taxonomic classifications
and single-cell genomics is vastly
expanding knowledge about genome diversity. Especially bacterial systematics is
in constant flux. Meta-data in sequence databases may be reporting using old
synonyms, and some samples may be misclassified.
AAI-profiler is a fast homology search tool that takes a query proteome
(protein sequences in FASTA format) as input and plots the AAI (Average
Amino-acid Identity) values of species in the Uniprot database. Direct comparison
of sequence data is a quicker way to get an overview of the taxonomy than searching literature on taxonomic
definitions. The homology search shows neighbouring bacterial genera but does
not resolve the basal lineages of the tree of life (Figure 1).
AAI-profiler compares amino acid sequences rather than nucleotide
sequences, which makes it practical also on eukaryotic query proteomes. Though
eukaryotic genomes are hundreds to thousands times longer than bacterial
genomes, the size of an eukaryotic proteome is typically only ten times larger
than a bacterial proteome. AAI-profiler
is powered by SANS and the processing
time for a bacterial proteome is a few minutes and less than an hour for an eukaryotic
proteome.
A main use
of AAI-profiler is as a quality control
tool in selecting data sets for phylogenomic or phylogenetic analysis. AAI-profiler reports sequence-based distances from the
query proteome to other species. One expects that taxa are monophyletic and
that there are smaller distances within a taxon than between species from different
taxa. Exceptions that can be detected using
AAI-profiler include:
·
misidentified species
·
mislabelled multi-isolate samples
·
contaminated samples
·
corrupted data included in bacterial pan-genomes
Correct
meta-data is important because many inference methods test the congruence of
sequence trees with the species tree (taxonomy) assuming that the species tree
is correct. Such applications are outside the scope of AAI-profiler but include:
·
tree
reconciliation to identify speciation and gene duplication events
·
the
identification of lateral gene transfer
·
LCA
(last common ancestor) approach for taxonomic profiling in metagenomics
AAI-profiler
is available as a web server. The scripts an also be downloaded and run locally
using remote databases.
Figure 1. Microbial
tree of life from https://www.nature.com/articles/nature12352
1. clinical review
2. SANSparallel
3. Expanded phylogenetic tree of life from
https://www.nature.com/articles/s41564-017-0012-7
4. Microbial tree of life from https://www.nature.com/articles/nature12352