The UniFrac significance test is sensitive to tree topology.
Bottom Line:
Specifically, they find that weighted UniFrac results differ between input trees where (a) replicate sequences each have their own tip, or (b) all replicates are assigned to one tip with an associated count.We argue that these are two distinct cases that differ in the probability distribution on which the statistical test is based, because of the differences in tree topology.Further study is needed to understand which randomization procedure best detects different aspects of community dissimilarities.
View Article:
PubMed Central - PubMed
Affiliation: Department of Medicine, University of Colorado Denver, Anschutz Medical Campus, Aurora, CO, 80045, USA. Catherine.Lozupone@ucdenver.edu.
ABSTRACT
Long et al. (BMC Bioinformatics 2014, 15(1):278) describe a "discrepancy" in using UniFrac to assess statistical significance of community differences. Specifically, they find that weighted UniFrac results differ between input trees where (a) replicate sequences each have their own tip, or (b) all replicates are assigned to one tip with an associated count. We argue that these are two distinct cases that differ in the probability distribution on which the statistical test is based, because of the differences in tree topology. Further study is needed to understand which randomization procedure best detects different aspects of community dissimilarities. No MeSH data available. |
Related In:
Results -
Collection
License 1 - License 2 getmorefigures.php?uid=PMC4492014&req=5
Mentions: UniFrac significance tests can be used to determine whether the types of sequences (e.g. representing bacterial 16S ribosomal RNA genes) in two different biological samples differ significantly between the samples. To do so, the sample assignments on an input phylogenetic tree are randomly re-assigned (i.e. randomizing the relationship between each tip on a tree and the sample labels), a distance between the two samples is calculated for each random dataset using either the unweighted or weighted UniFrac metric, and the fraction of the time that the true dataset has a smaller UniFrac distance between samples than the random datasets is assessed to produce a p-value [1]. In a recent paper [2], Long et al. show that the results of weighted UniFrac significance tests differ when applied to input trees in two different formats: first a tree in which replicate tips each with a count of 1 are added when the sequence is found multiple times (for example, a sequence with a count of 4 is added to the tree as 4 individual tips each with a count of 1, and a branch length of zero separating these tips from their shared parent), or second a tree in which each tip has a count related to its abundance (for example, a unique sequence that is found 4 times in a sample appears in the tree as a single tip with a count of 4) (Fig. 1). Long et al. assert that users of the UniFrac significance test should use the tool with caution, because the results can vary depending on the “arbitrary choice of input format.” They make the case that these two different tree formats are isomorphically and semantically equivalent and “merely use a different visual representation,” and that thus one should expect “any numeric calculations based on these trees to yield the same result.” We disagree strongly with these assertionsFig. 1 |
View Article: PubMed Central - PubMed
Affiliation: Department of Medicine, University of Colorado Denver, Anschutz Medical Campus, Aurora, CO, 80045, USA. Catherine.Lozupone@ucdenver.edu.
No MeSH data available.