Limits...
The UniFrac significance test is sensitive to tree topology.

Lozupone CA, Knight R - BMC Bioinformatics (2015)

Bottom Line: Specifically, they find that weighted UniFrac results differ between input trees where (a) replicate sequences each have their own tip, or (b) all replicates are assigned to one tip with an associated count.We argue that these are two distinct cases that differ in the probability distribution on which the statistical test is based, because of the differences in tree topology.Further study is needed to understand which randomization procedure best detects different aspects of community dissimilarities.

View Article: PubMed Central - PubMed

Affiliation: Department of Medicine, University of Colorado Denver, Anschutz Medical Campus, Aurora, CO, 80045, USA. Catherine.Lozupone@ucdenver.edu.

ABSTRACT
Long et al. (BMC Bioinformatics 2014, 15(1):278) describe a "discrepancy" in using UniFrac to assess statistical significance of community differences. Specifically, they find that weighted UniFrac results differ between input trees where (a) replicate sequences each have their own tip, or (b) all replicates are assigned to one tip with an associated count. We argue that these are two distinct cases that differ in the probability distribution on which the statistical test is based, because of the differences in tree topology. Further study is needed to understand which randomization procedure best detects different aspects of community dissimilarities.

No MeSH data available.


Simple representative trees representing the two different tree formats. Panel a shows a tree in which replicate tips, each with a count of 1, are added when the sequence is found multiple times. Panel b shows a tree representing the same data, but with replicate sequences represented by a single tip (e.g. as would occur if one picked OTUs and built the tree using a representative sequence for each OTU), and has a count related to each tip’s abundance in each different sample
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4492014&req=5

Fig1: Simple representative trees representing the two different tree formats. Panel a shows a tree in which replicate tips, each with a count of 1, are added when the sequence is found multiple times. Panel b shows a tree representing the same data, but with replicate sequences represented by a single tip (e.g. as would occur if one picked OTUs and built the tree using a representative sequence for each OTU), and has a count related to each tip’s abundance in each different sample

Mentions: UniFrac significance tests can be used to determine whether the types of sequences (e.g. representing bacterial 16S ribosomal RNA genes) in two different biological samples differ significantly between the samples. To do so, the sample assignments on an input phylogenetic tree are randomly re-assigned (i.e. randomizing the relationship between each tip on a tree and the sample labels), a distance between the two samples is calculated for each random dataset using either the unweighted or weighted UniFrac metric, and the fraction of the time that the true dataset has a smaller UniFrac distance between samples than the random datasets is assessed to produce a p-value [1]. In a recent paper [2], Long et al. show that the results of weighted UniFrac significance tests differ when applied to input trees in two different formats: first a tree in which replicate tips each with a count of 1 are added when the sequence is found multiple times (for example, a sequence with a count of 4 is added to the tree as 4 individual tips each with a count of 1, and a branch length of zero separating these tips from their shared parent), or second a tree in which each tip has a count related to its abundance (for example, a unique sequence that is found 4 times in a sample appears in the tree as a single tip with a count of 4) (Fig. 1). Long et al. assert that users of the UniFrac significance test should use the tool with caution, because the results can vary depending on the “arbitrary choice of input format.” They make the case that these two different tree formats are isomorphically and semantically equivalent and “merely use a different visual representation,” and that thus one should expect “any numeric calculations based on these trees to yield the same result.” We disagree strongly with these assertionsFig. 1


The UniFrac significance test is sensitive to tree topology.

Lozupone CA, Knight R - BMC Bioinformatics (2015)

Simple representative trees representing the two different tree formats. Panel a shows a tree in which replicate tips, each with a count of 1, are added when the sequence is found multiple times. Panel b shows a tree representing the same data, but with replicate sequences represented by a single tip (e.g. as would occur if one picked OTUs and built the tree using a representative sequence for each OTU), and has a count related to each tip’s abundance in each different sample
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4492014&req=5

Fig1: Simple representative trees representing the two different tree formats. Panel a shows a tree in which replicate tips, each with a count of 1, are added when the sequence is found multiple times. Panel b shows a tree representing the same data, but with replicate sequences represented by a single tip (e.g. as would occur if one picked OTUs and built the tree using a representative sequence for each OTU), and has a count related to each tip’s abundance in each different sample
Mentions: UniFrac significance tests can be used to determine whether the types of sequences (e.g. representing bacterial 16S ribosomal RNA genes) in two different biological samples differ significantly between the samples. To do so, the sample assignments on an input phylogenetic tree are randomly re-assigned (i.e. randomizing the relationship between each tip on a tree and the sample labels), a distance between the two samples is calculated for each random dataset using either the unweighted or weighted UniFrac metric, and the fraction of the time that the true dataset has a smaller UniFrac distance between samples than the random datasets is assessed to produce a p-value [1]. In a recent paper [2], Long et al. show that the results of weighted UniFrac significance tests differ when applied to input trees in two different formats: first a tree in which replicate tips each with a count of 1 are added when the sequence is found multiple times (for example, a sequence with a count of 4 is added to the tree as 4 individual tips each with a count of 1, and a branch length of zero separating these tips from their shared parent), or second a tree in which each tip has a count related to its abundance (for example, a unique sequence that is found 4 times in a sample appears in the tree as a single tip with a count of 4) (Fig. 1). Long et al. assert that users of the UniFrac significance test should use the tool with caution, because the results can vary depending on the “arbitrary choice of input format.” They make the case that these two different tree formats are isomorphically and semantically equivalent and “merely use a different visual representation,” and that thus one should expect “any numeric calculations based on these trees to yield the same result.” We disagree strongly with these assertionsFig. 1

Bottom Line: Specifically, they find that weighted UniFrac results differ between input trees where (a) replicate sequences each have their own tip, or (b) all replicates are assigned to one tip with an associated count.We argue that these are two distinct cases that differ in the probability distribution on which the statistical test is based, because of the differences in tree topology.Further study is needed to understand which randomization procedure best detects different aspects of community dissimilarities.

View Article: PubMed Central - PubMed

Affiliation: Department of Medicine, University of Colorado Denver, Anschutz Medical Campus, Aurora, CO, 80045, USA. Catherine.Lozupone@ucdenver.edu.

ABSTRACT
Long et al. (BMC Bioinformatics 2014, 15(1):278) describe a "discrepancy" in using UniFrac to assess statistical significance of community differences. Specifically, they find that weighted UniFrac results differ between input trees where (a) replicate sequences each have their own tip, or (b) all replicates are assigned to one tip with an associated count. We argue that these are two distinct cases that differ in the probability distribution on which the statistical test is based, because of the differences in tree topology. Further study is needed to understand which randomization procedure best detects different aspects of community dissimilarities.

No MeSH data available.