Limits...
Divergences in gene repertoire among the reference Prevotella genomes derived from distinct body sites of human.

Gupta VK, Chaudhari NM, Iskepalli S, Dutta C - BMC Genomics (2015)

Bottom Line: Distribution of various functional COG categories differs significantly among the habitat-specific genes.Prevotella genomes derived from different body sites differ appreciably in gene repertoire, suggesting that these microbiome components might have developed distinct genetic strategies for niche adaptation within the host.Each individual microbe might also have a component of its own genetic machinery for host adaptation, as appeared from the huge number of singletons.

View Article: PubMed Central - PubMed

Affiliation: Structural Biology & Bioinformatics Division, CSIR- Indian Institute of Chemical Biology, 4, Raja S. C. Mullick Road, Kolkata, 700032, India. vinodgupta299@gmail.com.

ABSTRACT

Background: The community composition of the human microbiome is known to vary at distinct anatomical niches. But little is known about the nature of variations, if any, at the genome/sub-genome levels of a specific microbial community across different niches. The present report aims to explore, as a case study, the variations in gene repertoire of 28 Prevotella reference genomes derived from different body-sites of human, as reported earlier by the Human Microbiome Consortium.

Results: The pan-genome for Prevotella remains "open". On an average, 17% of predicted protein-coding genes of any particular Prevotella genome represent the conserved core genes, while the remaining 83% contribute to the flexible and singletons. The study reveals exclusive presence of 11798, 3673, 3348 and 934 gene families and exclusive absence of 17, 221, 115 and 645 gene families in Prevotella genomes derived from human oral cavity, gastro-intestinal tracts (GIT), urogenital tract (UGT) and skin, respectively. Distribution of various functional COG categories differs significantly among the habitat-specific genes. No niche-specific variations could be observed in distribution of KEGG pathways.

Conclusions: Prevotella genomes derived from different body sites differ appreciably in gene repertoire, suggesting that these microbiome components might have developed distinct genetic strategies for niche adaptation within the host. Each individual microbe might also have a component of its own genetic machinery for host adaptation, as appeared from the huge number of singletons.

Show MeSH

Related in: MedlinePlus

Pan and core genome analysis of 28Prevotellagenomes. The number of shared genes is plotted (violet) as a function of the number of Prevotella genomes sequentially considered. The continuous curve represents the calculated core genome size, exponential curve fit model (ycore = AcoreeBcore.x + Ccore) was applied to the data. The best fit was obtained with r2 = 0.949, Acore = 5490.32, Bcore = −1.05, and Ccore = 567.29. The extrapolated Prevotella core genome size is 567. The size of Prevotella pan-genome is plotted (orange) as a function of the number of Prevotella genomes sequentially considered. The continuous curve represents calculated pan-genome size, the power-law regression model (ypan = ApanxBpan + Cpan) was applied to the data. The best fit was obtained with r2 = 0.999, Apan = 2389.18, Bpan = 0.7, and Cpan = 66.29. The extrapolated Prevotella pan-genome size is 24685. The vertical bars correspond to standard deviations after repeating random combinations of the genomes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4359502&req=5

Fig1: Pan and core genome analysis of 28Prevotellagenomes. The number of shared genes is plotted (violet) as a function of the number of Prevotella genomes sequentially considered. The continuous curve represents the calculated core genome size, exponential curve fit model (ycore = AcoreeBcore.x + Ccore) was applied to the data. The best fit was obtained with r2 = 0.949, Acore = 5490.32, Bcore = −1.05, and Ccore = 567.29. The extrapolated Prevotella core genome size is 567. The size of Prevotella pan-genome is plotted (orange) as a function of the number of Prevotella genomes sequentially considered. The continuous curve represents calculated pan-genome size, the power-law regression model (ypan = ApanxBpan + Cpan) was applied to the data. The best fit was obtained with r2 = 0.999, Apan = 2389.18, Bpan = 0.7, and Cpan = 66.29. The extrapolated Prevotella pan-genome size is 24685. The vertical bars correspond to standard deviations after repeating random combinations of the genomes.

Mentions: With a view to study the expansion of the pan-genome of PDGHM with sequential addition of more Prevotella genomes in the dataset, we have plotted the total number of distinct gene families against the number of genomes considered (Figure 1). Similarly, the number of shared gene families has been plotted against the number of genomes in order to generate the core-genome plot that depicts the trend in contraction in the core genome size with sequential addition of more genomes. In order to avoid any bias in the sequential addition of new genomes, random permutations in the order of addition of genomes were carried out and a median was taken on the size of pan-genomes or core genomes after each step (Figure 1). The median counts were then extrapolated using the power-law regression model for pan-genome and an exponential curve fit model in case of the core genome (see Methods for details). As depicted in Figure 1, the size of the pan-genome increases unboundedly with addition of new genomes and even after inclusion of 24885 non-redundant gene-families from all 28 members of PDGHM, the plot is yet to reach a plateau. On an average, each additional PDGHM genome contributed 827 new genes to the pool, leading to an open pan-genome. In accordance with these observations, the power-law regression shows that the pan-genome of PDGHM is indeed “open” [33] with a γ-parameter of 0.7 (here, Bpan).Figure 1


Divergences in gene repertoire among the reference Prevotella genomes derived from distinct body sites of human.

Gupta VK, Chaudhari NM, Iskepalli S, Dutta C - BMC Genomics (2015)

Pan and core genome analysis of 28Prevotellagenomes. The number of shared genes is plotted (violet) as a function of the number of Prevotella genomes sequentially considered. The continuous curve represents the calculated core genome size, exponential curve fit model (ycore = AcoreeBcore.x + Ccore) was applied to the data. The best fit was obtained with r2 = 0.949, Acore = 5490.32, Bcore = −1.05, and Ccore = 567.29. The extrapolated Prevotella core genome size is 567. The size of Prevotella pan-genome is plotted (orange) as a function of the number of Prevotella genomes sequentially considered. The continuous curve represents calculated pan-genome size, the power-law regression model (ypan = ApanxBpan + Cpan) was applied to the data. The best fit was obtained with r2 = 0.999, Apan = 2389.18, Bpan = 0.7, and Cpan = 66.29. The extrapolated Prevotella pan-genome size is 24685. The vertical bars correspond to standard deviations after repeating random combinations of the genomes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4359502&req=5

Fig1: Pan and core genome analysis of 28Prevotellagenomes. The number of shared genes is plotted (violet) as a function of the number of Prevotella genomes sequentially considered. The continuous curve represents the calculated core genome size, exponential curve fit model (ycore = AcoreeBcore.x + Ccore) was applied to the data. The best fit was obtained with r2 = 0.949, Acore = 5490.32, Bcore = −1.05, and Ccore = 567.29. The extrapolated Prevotella core genome size is 567. The size of Prevotella pan-genome is plotted (orange) as a function of the number of Prevotella genomes sequentially considered. The continuous curve represents calculated pan-genome size, the power-law regression model (ypan = ApanxBpan + Cpan) was applied to the data. The best fit was obtained with r2 = 0.999, Apan = 2389.18, Bpan = 0.7, and Cpan = 66.29. The extrapolated Prevotella pan-genome size is 24685. The vertical bars correspond to standard deviations after repeating random combinations of the genomes.
Mentions: With a view to study the expansion of the pan-genome of PDGHM with sequential addition of more Prevotella genomes in the dataset, we have plotted the total number of distinct gene families against the number of genomes considered (Figure 1). Similarly, the number of shared gene families has been plotted against the number of genomes in order to generate the core-genome plot that depicts the trend in contraction in the core genome size with sequential addition of more genomes. In order to avoid any bias in the sequential addition of new genomes, random permutations in the order of addition of genomes were carried out and a median was taken on the size of pan-genomes or core genomes after each step (Figure 1). The median counts were then extrapolated using the power-law regression model for pan-genome and an exponential curve fit model in case of the core genome (see Methods for details). As depicted in Figure 1, the size of the pan-genome increases unboundedly with addition of new genomes and even after inclusion of 24885 non-redundant gene-families from all 28 members of PDGHM, the plot is yet to reach a plateau. On an average, each additional PDGHM genome contributed 827 new genes to the pool, leading to an open pan-genome. In accordance with these observations, the power-law regression shows that the pan-genome of PDGHM is indeed “open” [33] with a γ-parameter of 0.7 (here, Bpan).Figure 1

Bottom Line: Distribution of various functional COG categories differs significantly among the habitat-specific genes.Prevotella genomes derived from different body sites differ appreciably in gene repertoire, suggesting that these microbiome components might have developed distinct genetic strategies for niche adaptation within the host.Each individual microbe might also have a component of its own genetic machinery for host adaptation, as appeared from the huge number of singletons.

View Article: PubMed Central - PubMed

Affiliation: Structural Biology & Bioinformatics Division, CSIR- Indian Institute of Chemical Biology, 4, Raja S. C. Mullick Road, Kolkata, 700032, India. vinodgupta299@gmail.com.

ABSTRACT

Background: The community composition of the human microbiome is known to vary at distinct anatomical niches. But little is known about the nature of variations, if any, at the genome/sub-genome levels of a specific microbial community across different niches. The present report aims to explore, as a case study, the variations in gene repertoire of 28 Prevotella reference genomes derived from different body-sites of human, as reported earlier by the Human Microbiome Consortium.

Results: The pan-genome for Prevotella remains "open". On an average, 17% of predicted protein-coding genes of any particular Prevotella genome represent the conserved core genes, while the remaining 83% contribute to the flexible and singletons. The study reveals exclusive presence of 11798, 3673, 3348 and 934 gene families and exclusive absence of 17, 221, 115 and 645 gene families in Prevotella genomes derived from human oral cavity, gastro-intestinal tracts (GIT), urogenital tract (UGT) and skin, respectively. Distribution of various functional COG categories differs significantly among the habitat-specific genes. No niche-specific variations could be observed in distribution of KEGG pathways.

Conclusions: Prevotella genomes derived from different body sites differ appreciably in gene repertoire, suggesting that these microbiome components might have developed distinct genetic strategies for niche adaptation within the host. Each individual microbe might also have a component of its own genetic machinery for host adaptation, as appeared from the huge number of singletons.

Show MeSH
Related in: MedlinePlus