Limits...
Waste not, want not: why rarefying microbiome data is inadmissible.

McMurdie PJ, Holmes S - PLoS Comput. Biol. (2014)

Bottom Line: For apparently historical reasons, the common approach is either to use simple proportions (which does not address heteroscedasticity) or to use rarefying of counts, even though both of these approaches are inappropriate for detection of differentially abundant species.Based on these results and well-established statistical theory, we advocate that investigators avoid rarefying altogether.We have provided microbiome-specific extensions to these tools in the R package, phyloseq.

View Article: PubMed Central - PubMed

Affiliation: Statistics Department, Stanford University, Stanford, California, United States of America.

ABSTRACT
Current practice in the normalization of microbiome count data is inefficient in the statistical sense. For apparently historical reasons, the common approach is either to use simple proportions (which does not address heteroscedasticity) or to use rarefying of counts, even though both of these approaches are inappropriate for detection of differentially abundant species. Well-established statistical theory is available that simultaneously accounts for library size differences and biological variability using an appropriate mixture model. Moreover, specific implementations for DNA sequencing read count data (based on a Negative Binomial model for instance) are already available in RNA-Seq focused R packages such as edgeR and DESeq. Here we summarize the supporting statistical theory and use simulations and empirical data to demonstrate substantial improvements provided by a relevant mixture model framework over simple proportions or rarefying. We show how both proportions and rarefied counts result in a high rate of false positives in tests for species that are differentially abundant across sample classes. Regarding microbiome sample-wise clustering, we also show that the rarefying procedure often discards samples that can be accurately clustered by alternative methods. We further compare different Negative Binomial methods with a recently-described zero-inflated Gaussian mixture, implemented in a package called metagenomeSeq. We find that metagenomeSeq performs well when there is an adequate number of biological replicates, but it nevertheless tends toward a higher false positive rate. Based on these results and well-established statistical theory, we advocate that investigators avoid rarefying altogether. We have provided microbiome-specific extensions to these tools in the R package, phyloseq.

Show MeSH
Performance of differential abundance detection with and without rarefying.Performance summarized here by the “Area Under the Curve” (AUC) metric of a Receiver Operator Curve (ROC) [59] (vertical axis). Briefly, the AUC value varies from 0.5 (random) to 1.0 (perfect), incorporating both sensitivity and specificity. The horizontal axis indicates the effect size, shown as the actual multiplication factor applied to OTU counts in the test class to simulate a differential abundance. Each curve traces the respective normalization method's mean performance of that panel, with a vertical bar indicating a standard deviation in performance across all replicates and microbiome templates. The right-hand side of the panel rows indicates the median library size, , while the darkness of line shading indicates the number of samples per simulated experiment. Color shade and shape indicate the normalization method. See Methods section for the definitions of each normalization and testing method. For all methods, detection among multiple tests was defined using a False Discovery Rate (Benjamini-Hochberg [52]) significance threshold of 0.05.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3974642&req=5

pcbi-1003531-g006: Performance of differential abundance detection with and without rarefying.Performance summarized here by the “Area Under the Curve” (AUC) metric of a Receiver Operator Curve (ROC) [59] (vertical axis). Briefly, the AUC value varies from 0.5 (random) to 1.0 (perfect), incorporating both sensitivity and specificity. The horizontal axis indicates the effect size, shown as the actual multiplication factor applied to OTU counts in the test class to simulate a differential abundance. Each curve traces the respective normalization method's mean performance of that panel, with a vertical bar indicating a standard deviation in performance across all replicates and microbiome templates. The right-hand side of the panel rows indicates the median library size, , while the darkness of line shading indicates the number of samples per simulated experiment. Color shade and shape indicate the normalization method. See Methods section for the definitions of each normalization and testing method. For all methods, detection among multiple tests was defined using a False Discovery Rate (Benjamini-Hochberg [52]) significance threshold of 0.05.

Mentions: In simulations evaluating performance in the detection of differential abundance, we found an improvement in sensitivity and specificity when normalization and subsequent tests are based upon a relevant mixture model (Figure 6). Multiple t-tests with correction for multiple inference did not perform well on this data, whether on rarefied counts or on proportions. A direct comparison of the performance of more sophisticated parametric methods applied to both original and rarefied counts demonstrates the strong potential of these methods and large improvements in sensitivity and specificity if rarefying is not used at all.


Waste not, want not: why rarefying microbiome data is inadmissible.

McMurdie PJ, Holmes S - PLoS Comput. Biol. (2014)

Performance of differential abundance detection with and without rarefying.Performance summarized here by the “Area Under the Curve” (AUC) metric of a Receiver Operator Curve (ROC) [59] (vertical axis). Briefly, the AUC value varies from 0.5 (random) to 1.0 (perfect), incorporating both sensitivity and specificity. The horizontal axis indicates the effect size, shown as the actual multiplication factor applied to OTU counts in the test class to simulate a differential abundance. Each curve traces the respective normalization method's mean performance of that panel, with a vertical bar indicating a standard deviation in performance across all replicates and microbiome templates. The right-hand side of the panel rows indicates the median library size, , while the darkness of line shading indicates the number of samples per simulated experiment. Color shade and shape indicate the normalization method. See Methods section for the definitions of each normalization and testing method. For all methods, detection among multiple tests was defined using a False Discovery Rate (Benjamini-Hochberg [52]) significance threshold of 0.05.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3974642&req=5

pcbi-1003531-g006: Performance of differential abundance detection with and without rarefying.Performance summarized here by the “Area Under the Curve” (AUC) metric of a Receiver Operator Curve (ROC) [59] (vertical axis). Briefly, the AUC value varies from 0.5 (random) to 1.0 (perfect), incorporating both sensitivity and specificity. The horizontal axis indicates the effect size, shown as the actual multiplication factor applied to OTU counts in the test class to simulate a differential abundance. Each curve traces the respective normalization method's mean performance of that panel, with a vertical bar indicating a standard deviation in performance across all replicates and microbiome templates. The right-hand side of the panel rows indicates the median library size, , while the darkness of line shading indicates the number of samples per simulated experiment. Color shade and shape indicate the normalization method. See Methods section for the definitions of each normalization and testing method. For all methods, detection among multiple tests was defined using a False Discovery Rate (Benjamini-Hochberg [52]) significance threshold of 0.05.
Mentions: In simulations evaluating performance in the detection of differential abundance, we found an improvement in sensitivity and specificity when normalization and subsequent tests are based upon a relevant mixture model (Figure 6). Multiple t-tests with correction for multiple inference did not perform well on this data, whether on rarefied counts or on proportions. A direct comparison of the performance of more sophisticated parametric methods applied to both original and rarefied counts demonstrates the strong potential of these methods and large improvements in sensitivity and specificity if rarefying is not used at all.

Bottom Line: For apparently historical reasons, the common approach is either to use simple proportions (which does not address heteroscedasticity) or to use rarefying of counts, even though both of these approaches are inappropriate for detection of differentially abundant species.Based on these results and well-established statistical theory, we advocate that investigators avoid rarefying altogether.We have provided microbiome-specific extensions to these tools in the R package, phyloseq.

View Article: PubMed Central - PubMed

Affiliation: Statistics Department, Stanford University, Stanford, California, United States of America.

ABSTRACT
Current practice in the normalization of microbiome count data is inefficient in the statistical sense. For apparently historical reasons, the common approach is either to use simple proportions (which does not address heteroscedasticity) or to use rarefying of counts, even though both of these approaches are inappropriate for detection of differentially abundant species. Well-established statistical theory is available that simultaneously accounts for library size differences and biological variability using an appropriate mixture model. Moreover, specific implementations for DNA sequencing read count data (based on a Negative Binomial model for instance) are already available in RNA-Seq focused R packages such as edgeR and DESeq. Here we summarize the supporting statistical theory and use simulations and empirical data to demonstrate substantial improvements provided by a relevant mixture model framework over simple proportions or rarefying. We show how both proportions and rarefied counts result in a high rate of false positives in tests for species that are differentially abundant across sample classes. Regarding microbiome sample-wise clustering, we also show that the rarefying procedure often discards samples that can be accurately clustered by alternative methods. We further compare different Negative Binomial methods with a recently-described zero-inflated Gaussian mixture, implemented in a package called metagenomeSeq. We find that metagenomeSeq performs well when there is an adequate number of biological replicates, but it nevertheless tends toward a higher false positive rate. Based on these results and well-established statistical theory, we advocate that investigators avoid rarefying altogether. We have provided microbiome-specific extensions to these tools in the R package, phyloseq.

Show MeSH