Limits...
The impact of sequence database choice on metaproteomic results in gut microbiota studies

View Article: PubMed Central - PubMed

ABSTRACT

Background: Elucidating the role of gut microbiota in physiological and pathological processes has recently emerged as a key research aim in life sciences. In this respect, metaproteomics, the study of the whole protein complement of a microbial community, can provide a unique contribution by revealing which functions are actually being expressed by specific microbial taxa. However, its wide application to gut microbiota research has been hindered by challenges in data analysis, especially related to the choice of the proper sequence databases for protein identification.

Results: Here, we present a systematic investigation of variables concerning database construction and annotation and evaluate their impact on human and mouse gut metaproteomic results. We found that both publicly available and experimental metagenomic databases lead to the identification of unique peptide assortments, suggesting parallel database searches as a mean to gain more complete information. In particular, the contribution of experimental metagenomic databases was revealed to be mandatory when dealing with mouse samples. Moreover, the use of a “merged” database, containing all metagenomic sequences from the population under study, was found to be generally preferable over the use of sample-matched databases. We also observed that taxonomic and functional results are strongly database-dependent, in particular when analyzing the mouse gut microbiota. As a striking example, the Firmicutes/Bacteroidetes ratio varied up to tenfold depending on the database used. Finally, assembling reads into longer contigs provided significant advantages in terms of functional annotation yields.

Conclusions: This study contributes to identify host- and database-specific biases which need to be taken into account in a metaproteomic experiment, providing meaningful insights on how to design gut microbiota studies and to perform metaproteomic data analysis. In particular, the use of multiple databases and annotation tools has to be encouraged, even though this requires appropriate bioinformatic resources.

Electronic supplementary material: The online version of this article (doi:10.1186/s40168-016-0196-8) contains supplementary material, which is available to authorized users.

No MeSH data available.


Firmicutes/Bacteroidetes ratio in human and mouse subjects measured using different databases. Results obtained with human (N = 3, left) and mouse (N = 3, right) samples, with each dot representing a single sample. DBs were made up of reads (dark blue, R-A) or contigs (red, C-A) from gut metagenomes of all subjects analyzed for each host species, or all bacterial sequences deposited in UniProt (turquoise, UP-B). The annotation tools used were MEGAN (MEG) and Unipept (Up). *p < 0.05; **p < 0.01; ***p < 0.001 (paired t test)
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC5037606&req=5

Fig5: Firmicutes/Bacteroidetes ratio in human and mouse subjects measured using different databases. Results obtained with human (N = 3, left) and mouse (N = 3, right) samples, with each dot representing a single sample. DBs were made up of reads (dark blue, R-A) or contigs (red, C-A) from gut metagenomes of all subjects analyzed for each host species, or all bacterial sequences deposited in UniProt (turquoise, UP-B). The annotation tools used were MEGAN (MEG) and Unipept (Up). *p < 0.05; **p < 0.01; ***p < 0.001 (paired t test)

Mentions: Taxa abundances were then subjected to LEfSe differential analysis, in order to find taxa specifically enriched/depleted when using a particular DB type. To gain insight into the differential taxonomic information achievable with each DB, we distinguished between “qualitatively differential taxa” (i.e., always present when using a given DB and completely absent when using another DB) and “quantitatively differential taxa” (i.e., all the remaining differential taxa). As a result (Additional file 5: Figure S5), the mean amount of taxa significantly changing in abundance when using different DBs was dramatically higher in mice than in humans (61 vs. 16 % with MEGAN, 25 vs. 2 % with Unipept, respectively). Furthermore, the percentage of taxa consistently identified with MG-DBs in all subjects and never detected with UP-B, according to Unipept classification, was in human and 15 % in mouse samples (0.5 and 3 % with MEGAN, respectively), with over 30 taxa uniquely detected with the contig-based DB. Hierarchical representation of differential taxa (graphically presented in cladograms of Additional file 6: Figure S6 and Additional file 7: Figure S7) revealed, again, few and reproducible distinctions in humans, counterbalanced by massive DB- and annotation tool-dependent biases in mice, involving (especially for MEGAN data) all main gut bacterial taxa, from phyla down to species. We finally took into account the Firmicutes/Bacteroidetes ratio, whose longitudinal change has been established as a synthetic measure of eubiosis/dysbiosis state within the gut microbiota [23, 24]. As illustrated in Fig. 5, we found a DB-related bias significantly influencing this parameter not only in mouse but also in human results, especially—and dramatically—when comparing UP-B to MG-DBs (up to tenfold higher ratio, when applying MEGAN to mouse data).Fig. 5


The impact of sequence database choice on metaproteomic results in gut microbiota studies
Firmicutes/Bacteroidetes ratio in human and mouse subjects measured using different databases. Results obtained with human (N = 3, left) and mouse (N = 3, right) samples, with each dot representing a single sample. DBs were made up of reads (dark blue, R-A) or contigs (red, C-A) from gut metagenomes of all subjects analyzed for each host species, or all bacterial sequences deposited in UniProt (turquoise, UP-B). The annotation tools used were MEGAN (MEG) and Unipept (Up). *p < 0.05; **p < 0.01; ***p < 0.001 (paired t test)
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC5037606&req=5

Fig5: Firmicutes/Bacteroidetes ratio in human and mouse subjects measured using different databases. Results obtained with human (N = 3, left) and mouse (N = 3, right) samples, with each dot representing a single sample. DBs were made up of reads (dark blue, R-A) or contigs (red, C-A) from gut metagenomes of all subjects analyzed for each host species, or all bacterial sequences deposited in UniProt (turquoise, UP-B). The annotation tools used were MEGAN (MEG) and Unipept (Up). *p < 0.05; **p < 0.01; ***p < 0.001 (paired t test)
Mentions: Taxa abundances were then subjected to LEfSe differential analysis, in order to find taxa specifically enriched/depleted when using a particular DB type. To gain insight into the differential taxonomic information achievable with each DB, we distinguished between “qualitatively differential taxa” (i.e., always present when using a given DB and completely absent when using another DB) and “quantitatively differential taxa” (i.e., all the remaining differential taxa). As a result (Additional file 5: Figure S5), the mean amount of taxa significantly changing in abundance when using different DBs was dramatically higher in mice than in humans (61 vs. 16 % with MEGAN, 25 vs. 2 % with Unipept, respectively). Furthermore, the percentage of taxa consistently identified with MG-DBs in all subjects and never detected with UP-B, according to Unipept classification, was in human and 15 % in mouse samples (0.5 and 3 % with MEGAN, respectively), with over 30 taxa uniquely detected with the contig-based DB. Hierarchical representation of differential taxa (graphically presented in cladograms of Additional file 6: Figure S6 and Additional file 7: Figure S7) revealed, again, few and reproducible distinctions in humans, counterbalanced by massive DB- and annotation tool-dependent biases in mice, involving (especially for MEGAN data) all main gut bacterial taxa, from phyla down to species. We finally took into account the Firmicutes/Bacteroidetes ratio, whose longitudinal change has been established as a synthetic measure of eubiosis/dysbiosis state within the gut microbiota [23, 24]. As illustrated in Fig. 5, we found a DB-related bias significantly influencing this parameter not only in mouse but also in human results, especially—and dramatically—when comparing UP-B to MG-DBs (up to tenfold higher ratio, when applying MEGAN to mouse data).Fig. 5

View Article: PubMed Central - PubMed

ABSTRACT

Background: Elucidating the role of gut microbiota in physiological and pathological processes has recently emerged as a key research aim in life sciences. In this respect, metaproteomics, the study of the whole protein complement of a microbial community, can provide a unique contribution by revealing which functions are actually being expressed by specific microbial taxa. However, its wide application to gut microbiota research has been hindered by challenges in data analysis, especially related to the choice of the proper sequence databases for protein identification.

Results: Here, we present a systematic investigation of variables concerning database construction and annotation and evaluate their impact on human and mouse gut metaproteomic results. We found that both publicly available and experimental metagenomic databases lead to the identification of unique peptide assortments, suggesting parallel database searches as a mean to gain more complete information. In particular, the contribution of experimental metagenomic databases was revealed to be mandatory when dealing with mouse samples. Moreover, the use of a &ldquo;merged&rdquo; database, containing all metagenomic sequences from the population under study, was found to be generally preferable over the use of sample-matched databases. We also observed that taxonomic and functional results are strongly database-dependent, in particular when analyzing the mouse gut microbiota. As a striking example, the Firmicutes/Bacteroidetes ratio varied up to tenfold depending on the database used. Finally, assembling reads into longer contigs provided significant advantages in terms of functional annotation yields.

Conclusions: This study contributes to identify host- and database-specific biases which need to be taken into account in a metaproteomic experiment, providing meaningful insights on how to design gut microbiota studies and to perform metaproteomic data analysis. In particular, the use of multiple databases and annotation tools has to be encouraged, even though this requires appropriate bioinformatic resources.

Electronic supplementary material: The online version of this article (doi:10.1186/s40168-016-0196-8) contains supplementary material, which is available to authorized users.

No MeSH data available.