Limits...
Taxator-tk: precise taxonomic assignment of metagenomes by fast approximation of evolutionary neighborhoods.

Dröge J, Gregor I, McHardy AC - Bioinformatics (2014)

Bottom Line: Taxator-tk was precise in its taxonomic assignment across all ranks and taxa for a range of evolutionary distances and for short as well as for long sequences.In addition to the taxonomic binning of metagenomes, it is well suited for profiling microbial communities from metagenome samples because it identifies bacterial, archaeal and eukaryotic community members without being affected by varying primer binding strengths, as in marker gene amplification, or copy number variations of marker genes across different taxa.Taxator-tk has an efficient, parallelized implementation that allows the assignment of 6 Gb of sequence data per day on a standard multiprocessor system with 10 CPU cores and microbial RefSeq as the genomic reference data.

View Article: PubMed Central - PubMed

Affiliation: Department for Algorithmic Bioinformatics, Heinrich Heine University, Universitätsstraße 1, 40225 Düsseldorf, Germany, Max-Planck Research Group for Computational Genomics and Epidemiology, Max-Planck Institute for Informatics, University Campus E1 4, 66123 Saarbrücken, Germany and Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Inhoffenstraße 7, 38124 Braunschweig, Germany Department for Algorithmic Bioinformatics, Heinrich Heine University, Universitätsstraße 1, 40225 Düsseldorf, Germany, Max-Planck Research Group for Computational Genomics and Epidemiology, Max-Planck Institute for Informatics, University Campus E1 4, 66123 Saarbrücken, Germany and Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Inhoffenstraße 7, 38124 Braunschweig, Germany.

Show MeSH

Related in: MedlinePlus

Algorithm for taxonomic labeling of query segments (realignment placement algorithm/RPA). The RPA assigns a taxon ID to a query segment q. (a) Species reference tree with query taxon Q and reference taxa A, B, C, D, O and S. This will be approximated by the segment phylogenetic tree for the query segment and homologous segments of reference taxa. (b) Approximate graph representing pairwise distances between the taxa. The subgraph for clade X is highlighted. (c and d) The two alignment passes which add segment taxa to an (empty) set M. Segment s is the segment with the smallest local alignment score (distance) to q in the initial similarity search. (c) First, all segments are aligned to segment s. The resulting distances are ordered and the taxa with equal or smaller distances than distance(s,q) are added to M. The outgroup segment, here o, is the next most similar segment to s after q, with distance(o,s) > distance(s,q). (d) All segments are aligned to o. From the ranked distances, taxa with distances smaller than distance(o,q) are also added to M. Thus, M includes all the nearest evolutionary neighbors for the query segment q (the taxa corresponding to segments a, b, c, d, o and s). The taxon ID then assigned to q is the lowest common ancestor in the reference species tree (reference taxonomy) of these taxa in M. (e) Partially resolved segment subtree at node R that is implied by distances obtained in (c) and (d), where the exact position of some segments (a, b, c and d; dashed branches) is left unresolved by the RPA
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4380030&req=5

btu745-F2: Algorithm for taxonomic labeling of query segments (realignment placement algorithm/RPA). The RPA assigns a taxon ID to a query segment q. (a) Species reference tree with query taxon Q and reference taxa A, B, C, D, O and S. This will be approximated by the segment phylogenetic tree for the query segment and homologous segments of reference taxa. (b) Approximate graph representing pairwise distances between the taxa. The subgraph for clade X is highlighted. (c and d) The two alignment passes which add segment taxa to an (empty) set M. Segment s is the segment with the smallest local alignment score (distance) to q in the initial similarity search. (c) First, all segments are aligned to segment s. The resulting distances are ordered and the taxa with equal or smaller distances than distance(s,q) are added to M. The outgroup segment, here o, is the next most similar segment to s after q, with distance(o,s) > distance(s,q). (d) All segments are aligned to o. From the ranked distances, taxa with distances smaller than distance(o,q) are also added to M. Thus, M includes all the nearest evolutionary neighbors for the query segment q (the taxa corresponding to segments a, b, c, d, o and s). The taxon ID then assigned to q is the lowest common ancestor in the reference species tree (reference taxonomy) of these taxa in M. (e) Partially resolved segment subtree at node R that is implied by distances obtained in (c) and (d), where the exact position of some segments (a, b, c and d; dashed branches) is left unresolved by the RPA

Mentions: The input to the algorithm is a segment q of the original query sequence from an (unknown) taxon Q and a set of homologous segments with known taxon IDs. The term ‘segment’ refers to a gap-less subsequence of either the query or a reference sequence. Given that for the set of homologs we know the correct underlying species tree of taxa (Fig. 2a), we can see that for our query taxon Q, the closest evolutionary neighbors would be A, B and S. If we simply assign X, the parental taxon of A, B and S, as a taxon identifier, this would be inaccurate, as A, B and S are more closely related to each other than to Q. Instead, the correct taxonomic assignment would be a parent of X and Q, and of at least one additional outgroup taxon (O) in the reference tree, such that Q also becomes a descendant of the identified parent (R in Fig. 2a). If we therefore identify the taxa A, B, S and O in the reference tree, we can determine the taxon ID of R as the lowest common ancestor (LCA) of these taxa and assign it to Q (and q).Fig. 2.


Taxator-tk: precise taxonomic assignment of metagenomes by fast approximation of evolutionary neighborhoods.

Dröge J, Gregor I, McHardy AC - Bioinformatics (2014)

Algorithm for taxonomic labeling of query segments (realignment placement algorithm/RPA). The RPA assigns a taxon ID to a query segment q. (a) Species reference tree with query taxon Q and reference taxa A, B, C, D, O and S. This will be approximated by the segment phylogenetic tree for the query segment and homologous segments of reference taxa. (b) Approximate graph representing pairwise distances between the taxa. The subgraph for clade X is highlighted. (c and d) The two alignment passes which add segment taxa to an (empty) set M. Segment s is the segment with the smallest local alignment score (distance) to q in the initial similarity search. (c) First, all segments are aligned to segment s. The resulting distances are ordered and the taxa with equal or smaller distances than distance(s,q) are added to M. The outgroup segment, here o, is the next most similar segment to s after q, with distance(o,s) > distance(s,q). (d) All segments are aligned to o. From the ranked distances, taxa with distances smaller than distance(o,q) are also added to M. Thus, M includes all the nearest evolutionary neighbors for the query segment q (the taxa corresponding to segments a, b, c, d, o and s). The taxon ID then assigned to q is the lowest common ancestor in the reference species tree (reference taxonomy) of these taxa in M. (e) Partially resolved segment subtree at node R that is implied by distances obtained in (c) and (d), where the exact position of some segments (a, b, c and d; dashed branches) is left unresolved by the RPA
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4380030&req=5

btu745-F2: Algorithm for taxonomic labeling of query segments (realignment placement algorithm/RPA). The RPA assigns a taxon ID to a query segment q. (a) Species reference tree with query taxon Q and reference taxa A, B, C, D, O and S. This will be approximated by the segment phylogenetic tree for the query segment and homologous segments of reference taxa. (b) Approximate graph representing pairwise distances between the taxa. The subgraph for clade X is highlighted. (c and d) The two alignment passes which add segment taxa to an (empty) set M. Segment s is the segment with the smallest local alignment score (distance) to q in the initial similarity search. (c) First, all segments are aligned to segment s. The resulting distances are ordered and the taxa with equal or smaller distances than distance(s,q) are added to M. The outgroup segment, here o, is the next most similar segment to s after q, with distance(o,s) > distance(s,q). (d) All segments are aligned to o. From the ranked distances, taxa with distances smaller than distance(o,q) are also added to M. Thus, M includes all the nearest evolutionary neighbors for the query segment q (the taxa corresponding to segments a, b, c, d, o and s). The taxon ID then assigned to q is the lowest common ancestor in the reference species tree (reference taxonomy) of these taxa in M. (e) Partially resolved segment subtree at node R that is implied by distances obtained in (c) and (d), where the exact position of some segments (a, b, c and d; dashed branches) is left unresolved by the RPA
Mentions: The input to the algorithm is a segment q of the original query sequence from an (unknown) taxon Q and a set of homologous segments with known taxon IDs. The term ‘segment’ refers to a gap-less subsequence of either the query or a reference sequence. Given that for the set of homologs we know the correct underlying species tree of taxa (Fig. 2a), we can see that for our query taxon Q, the closest evolutionary neighbors would be A, B and S. If we simply assign X, the parental taxon of A, B and S, as a taxon identifier, this would be inaccurate, as A, B and S are more closely related to each other than to Q. Instead, the correct taxonomic assignment would be a parent of X and Q, and of at least one additional outgroup taxon (O) in the reference tree, such that Q also becomes a descendant of the identified parent (R in Fig. 2a). If we therefore identify the taxa A, B, S and O in the reference tree, we can determine the taxon ID of R as the lowest common ancestor (LCA) of these taxa and assign it to Q (and q).Fig. 2.

Bottom Line: Taxator-tk was precise in its taxonomic assignment across all ranks and taxa for a range of evolutionary distances and for short as well as for long sequences.In addition to the taxonomic binning of metagenomes, it is well suited for profiling microbial communities from metagenome samples because it identifies bacterial, archaeal and eukaryotic community members without being affected by varying primer binding strengths, as in marker gene amplification, or copy number variations of marker genes across different taxa.Taxator-tk has an efficient, parallelized implementation that allows the assignment of 6 Gb of sequence data per day on a standard multiprocessor system with 10 CPU cores and microbial RefSeq as the genomic reference data.

View Article: PubMed Central - PubMed

Affiliation: Department for Algorithmic Bioinformatics, Heinrich Heine University, Universitätsstraße 1, 40225 Düsseldorf, Germany, Max-Planck Research Group for Computational Genomics and Epidemiology, Max-Planck Institute for Informatics, University Campus E1 4, 66123 Saarbrücken, Germany and Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Inhoffenstraße 7, 38124 Braunschweig, Germany Department for Algorithmic Bioinformatics, Heinrich Heine University, Universitätsstraße 1, 40225 Düsseldorf, Germany, Max-Planck Research Group for Computational Genomics and Epidemiology, Max-Planck Institute for Informatics, University Campus E1 4, 66123 Saarbrücken, Germany and Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Inhoffenstraße 7, 38124 Braunschweig, Germany.

Show MeSH
Related in: MedlinePlus