AAI-profiler tutorial

 

Aims and scope

Whole-genome shotgun sequencing has propelled the re-evaluation of taxonomic classifications and single-cell genomics is vastly expanding knowledge about genome diversity. Especially bacterial systematics is in constant flux. Meta-data in sequence databases may be reporting using old synonyms, and some samples may be misclassified.

AAI-profiler is a fast homology search tool that takes a query proteome (protein sequences in FASTA format) as input and plots the AAI (Average Amino-acid Identity) values of species in the Uniprot database. Direct comparison of sequence data is a quicker way to get an overview of the taxonomy than searching literature on taxonomic definitions. The homology search shows neighbouring bacterial genera but does not resolve the basal lineages of the tree of life (Figure 1).

AAI-profiler compares amino acid sequences rather than nucleotide sequences, which makes it practical also on eukaryotic query proteomes. Though eukaryotic genomes are hundreds to thousands times longer than bacterial genomes, the size of an eukaryotic proteome is typically only ten times larger than a bacterial proteome. AAI-profiler is powered by SANS and the processing time for a bacterial proteome is a few minutes and less than an hour for an eukaryotic proteome.

A main use of AAI-profiler is as a quality control tool in selecting data sets for phylogenomic or phylogenetic analysis. AAI-profiler reports sequence-based distances from the query proteome to other species. One expects that taxa are monophyletic and that there are smaller distances within a taxon than between species from different taxa. Exceptions that can be detected using AAI-profiler include:

·        misidentified species

·        mislabelled multi-isolate samples

·        contaminated samples

·        corrupted data included in bacterial pan-genomes

Correct meta-data is important because many inference methods test the congruence of sequence trees with the species tree (taxonomy) assuming that the species tree is correct. Such applications are outside the scope of AAI-profiler but include:

·        tree reconciliation to identify speciation and gene duplication events

·        the identification of lateral gene transfer

·        LCA (last common ancestor) approach for taxonomic profiling in metagenomics

AAI-profiler is available as a web server. The scripts an also be downloaded and run locally using remote databases.

 

Figure 2

Figure 1. Microbial tree of life from https://www.nature.com/articles/nature12352

 

Web interface

Inputs

File upload

Upload file from URL

Visualization

Scatterplot

Barcharts

Piecharts

Export results

tab: Sequence neighbors

tax: AAI profiles

uniq: Taxonomic assignments

Interpretation of results

Monophyletic taxa

Polyphyletic taxa

Large distance to known taxa

Misidentified samples

Multi-isolate samples

Pan-proteomes

Methods

Protein database and taxonomy

Sequence similarity search

Taxonomic profiling

Plot generation

Software download

Requirements: Linux OS, Python, Perl, SANSPANZ

Installation

Testing

Running AAI-profiler locally

Related tools

Phylogenomics

ANI

AAI

Phylogenetics

MEGA, Phylip, RaxML

MLSA

Metagenomics

Kraken, MEGAN6, MG-RAST

LCA

References

1.      clinical review

2.      SANSparallel

3.      Expanded phylogenetic tree of life from https://www.nature.com/articles/s41564-017-0012-7

4.      Microbial tree of life from https://www.nature.com/articles/nature12352