Limits...
Lactobacillus paracasei comparative genomics: towards species pan-genome definition and exploitation of diversity.

Smokvina T, Wels M, Polka J, Chervaux C, Brisse S, Boekhorst J, van Hylckama Vlieg JE, Siezen RJ - PLoS ONE (2013)

Bottom Line: Several factors previously associated with host-microbe interactions such as pili, cell-envelope proteinase, hydrolases p40 and p75 or the capacity to produce short branched-chain fatty acids (bkd operon) are part of the L. paracasei core genome present in all analysed strains.The variome consists mainly of hypothetical proteins, phages, plasmids, transposon/conjugative elements, and known functions such as sugar metabolism, cell-surface proteins, transporters, CRISPR-associated proteins, and EPS biosynthesis proteins.A phylogenomic tree was constructed based on total genome contents, and together with an analysis of horizontal gene transfer events we conclude that evolution of these L. paracasei strains is complex and not always related to niche adaptation.

View Article: PubMed Central - PubMed

Affiliation: Danone Research, Palaiseau, France.

ABSTRACT
Lactobacillus paracasei is a member of the normal human and animal gut microbiota and is used extensively in the food industry in starter cultures for dairy products or as probiotics. With the development of low-cost, high-throughput sequencing techniques it has become feasible to sequence many different strains of one species and to determine its "pan-genome". We have sequenced the genomes of 34 different L. paracasei strains, and performed a comparative genomics analysis. We analysed genome synteny and content, focussing on the pan-genome, core genome and variable genome. Each genome was shown to contain around 2800-3100 protein-coding genes, and comparative analysis identified over 4200 ortholog groups that comprise the pan-genome of this species, of which about 1800 ortholog groups make up the conserved core. Several factors previously associated with host-microbe interactions such as pili, cell-envelope proteinase, hydrolases p40 and p75 or the capacity to produce short branched-chain fatty acids (bkd operon) are part of the L. paracasei core genome present in all analysed strains. The variome consists mainly of hypothetical proteins, phages, plasmids, transposon/conjugative elements, and known functions such as sugar metabolism, cell-surface proteins, transporters, CRISPR-associated proteins, and EPS biosynthesis proteins. An enormous variety and variability of sugar utilization gene cassettes were identified, with each strain harbouring between 25-53 cassettes, reflecting the high adaptability of L. paracasei to different niches. A phylogenomic tree was constructed based on total genome contents, and together with an analysis of horizontal gene transfer events we conclude that evolution of these L. paracasei strains is complex and not always related to niche adaptation. The results of this genome content comparison was used, together with high-throughput growth experiments on various carbohydrates, to perform gene-trait matching analysis, in order to link the distribution pattern of a specific phenotype to the presence/absence of specific sets of genes.

Show MeSH

Related in: MedlinePlus

Pan-genome prediction.The number of pan-genome OGs (blue) and core genome OGs (red) is shown as a function of genomes added to the pan-genome. OGs present in only one annotated genome were not included if they appeared to represent gene fragments or overpredicted small genes.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3716772&req=5

pone-0068731-g001: Pan-genome prediction.The number of pan-genome OGs (blue) and core genome OGs (red) is shown as a function of genomes added to the pan-genome. OGs present in only one annotated genome were not included if they appeared to represent gene fragments or overpredicted small genes.

Mentions: The microbial pan-genome is defined as the full complement of genes in a species, and is typically applied to bacteria and archaea, which can have large variations in gene content among closely related strains [21], [23]. It is the total set of all the genes found in all the strains of a species. A first estimate of the L. paracasei pan-genome was calculated using only the 10 RAST-annotated and 3 reference genomes, which have manually curated ORF calling and annotation. We identified a total of about 4200 OGs present in at least two L paracasei genomes, of which ∼230 OGs are presumably plasmid-encoded, the “plasmid pan-genome” (see below). Figure 1 shows the predicted pan-genome size as a function of the number of genomes sequenced. It appears that the pan-genome size is levelling off (at about 4300–4500 genes), as every extra genome adds less new genes. This upper limit may be an overestimate, since some of the draft genomes added have lower coverage, hence poorer ORF prediction and usually overprediction of ORFs due to gene fragments.


Lactobacillus paracasei comparative genomics: towards species pan-genome definition and exploitation of diversity.

Smokvina T, Wels M, Polka J, Chervaux C, Brisse S, Boekhorst J, van Hylckama Vlieg JE, Siezen RJ - PLoS ONE (2013)

Pan-genome prediction.The number of pan-genome OGs (blue) and core genome OGs (red) is shown as a function of genomes added to the pan-genome. OGs present in only one annotated genome were not included if they appeared to represent gene fragments or overpredicted small genes.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3716772&req=5

pone-0068731-g001: Pan-genome prediction.The number of pan-genome OGs (blue) and core genome OGs (red) is shown as a function of genomes added to the pan-genome. OGs present in only one annotated genome were not included if they appeared to represent gene fragments or overpredicted small genes.
Mentions: The microbial pan-genome is defined as the full complement of genes in a species, and is typically applied to bacteria and archaea, which can have large variations in gene content among closely related strains [21], [23]. It is the total set of all the genes found in all the strains of a species. A first estimate of the L. paracasei pan-genome was calculated using only the 10 RAST-annotated and 3 reference genomes, which have manually curated ORF calling and annotation. We identified a total of about 4200 OGs present in at least two L paracasei genomes, of which ∼230 OGs are presumably plasmid-encoded, the “plasmid pan-genome” (see below). Figure 1 shows the predicted pan-genome size as a function of the number of genomes sequenced. It appears that the pan-genome size is levelling off (at about 4300–4500 genes), as every extra genome adds less new genes. This upper limit may be an overestimate, since some of the draft genomes added have lower coverage, hence poorer ORF prediction and usually overprediction of ORFs due to gene fragments.

Bottom Line: Several factors previously associated with host-microbe interactions such as pili, cell-envelope proteinase, hydrolases p40 and p75 or the capacity to produce short branched-chain fatty acids (bkd operon) are part of the L. paracasei core genome present in all analysed strains.The variome consists mainly of hypothetical proteins, phages, plasmids, transposon/conjugative elements, and known functions such as sugar metabolism, cell-surface proteins, transporters, CRISPR-associated proteins, and EPS biosynthesis proteins.A phylogenomic tree was constructed based on total genome contents, and together with an analysis of horizontal gene transfer events we conclude that evolution of these L. paracasei strains is complex and not always related to niche adaptation.

View Article: PubMed Central - PubMed

Affiliation: Danone Research, Palaiseau, France.

ABSTRACT
Lactobacillus paracasei is a member of the normal human and animal gut microbiota and is used extensively in the food industry in starter cultures for dairy products or as probiotics. With the development of low-cost, high-throughput sequencing techniques it has become feasible to sequence many different strains of one species and to determine its "pan-genome". We have sequenced the genomes of 34 different L. paracasei strains, and performed a comparative genomics analysis. We analysed genome synteny and content, focussing on the pan-genome, core genome and variable genome. Each genome was shown to contain around 2800-3100 protein-coding genes, and comparative analysis identified over 4200 ortholog groups that comprise the pan-genome of this species, of which about 1800 ortholog groups make up the conserved core. Several factors previously associated with host-microbe interactions such as pili, cell-envelope proteinase, hydrolases p40 and p75 or the capacity to produce short branched-chain fatty acids (bkd operon) are part of the L. paracasei core genome present in all analysed strains. The variome consists mainly of hypothetical proteins, phages, plasmids, transposon/conjugative elements, and known functions such as sugar metabolism, cell-surface proteins, transporters, CRISPR-associated proteins, and EPS biosynthesis proteins. An enormous variety and variability of sugar utilization gene cassettes were identified, with each strain harbouring between 25-53 cassettes, reflecting the high adaptability of L. paracasei to different niches. A phylogenomic tree was constructed based on total genome contents, and together with an analysis of horizontal gene transfer events we conclude that evolution of these L. paracasei strains is complex and not always related to niche adaptation. The results of this genome content comparison was used, together with high-throughput growth experiments on various carbohydrates, to perform gene-trait matching analysis, in order to link the distribution pattern of a specific phenotype to the presence/absence of specific sets of genes.

Show MeSH
Related in: MedlinePlus