Limits...
Comparative whole-genome analysis of clinical isolates reveals characteristic architecture of Mycobacterium tuberculosis pangenome.

Periwal V, Patowary A, Vellarikkal SK, Gupta A, Singh M, Mittal A, Jeyapaul S, Chauhan RK, Singh AV, Singh PK, Garg P, Katoch VM, Katoch K, Chauhan DS, Sivasubbu S, Scaria V - PLoS ONE (2015)

Bottom Line: We identified 74 HGCs that were absent from reference strains H37Rv and H37Ra but were present in most of clinical isolates.The pangenome approach is a promising tool for studying strain specific genetic differences occurring within species.We also suggest that since selecting appropriate target genes for typing purposes requires the expected target gene be present in all isolates being typed, therefore estimating the core-component of the species becomes a subject of prime importance.

View Article: PubMed Central - PubMed

Affiliation: GN Ramachandran Knowledge Center for Genome Informatics, CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mall Road, Delhi-110007, India; Academy of Scientific & Innovative Research (AcSIR), 2, Rafi Marg, Anusandhan Bhawan, New Delhi 110001, India.

ABSTRACT
The tubercle complex consists of closely related mycobacterium species which appear to be variants of a single species. Comparative genome analysis of different strains could provide useful clues and insights into the genetic diversity of the species. We integrated genome assemblies of 96 strains from Mycobacterium tuberculosis complex (MTBC), which included 8 Indian clinical isolates sequenced and assembled in this study, to understand its pangenome architecture. We predicted genes for all the 96 strains and clustered their respective CDSs into homologous gene clusters (HGCs) to reveal a hard-core, soft-core and accessory genome component of MTBC. The hard-core (HGCs shared amongst 100% of the strains) was comprised of 2,066 gene clusters whereas the soft-core (HGCs shared amongst at least 95% of the strains) comprised of 3,374 gene clusters. The change in the core and accessory genome components when observed as a function of their size revealed that MTBC has an open pangenome. We identified 74 HGCs that were absent from reference strains H37Rv and H37Ra but were present in most of clinical isolates. We report PCR validation on 9 candidate genes depicting 7 genes completely absent from H37Rv and H37Ra whereas 2 genes shared partial homology with them accounting to probable insertion and deletion events. The pangenome approach is a promising tool for studying strain specific genetic differences occurring within species. We also suggest that since selecting appropriate target genes for typing purposes requires the expected target gene be present in all isolates being typed, therefore estimating the core-component of the species becomes a subject of prime importance.

No MeSH data available.


Related in: MedlinePlus

Core and accessory genome size evolution.(A) Each point indicates the number of HGCs conserved in a genome. The red line indicates an exponential decay function based on the median values of core HGCs when each time a new genome is added to the analysis. (B) Accessory genome of MTBC. The MTBC has an open pangenome model.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4390332&req=5

pone.0122979.g002: Core and accessory genome size evolution.(A) Each point indicates the number of HGCs conserved in a genome. The red line indicates an exponential decay function based on the median values of core HGCs when each time a new genome is added to the analysis. (B) Accessory genome of MTBC. The MTBC has an open pangenome model.

Mentions: In view of defining the size of MTBC pangenome the primary question that arises is whether sufficient number of genomes has been sequenced to describe the core and accessory gene content of the species. For this we observed the change in the core and accessory gene component as a function of their size with increasing number of sampled genomes over the entire 96 genomes (Fig 2). The total genome component of the 96 MTBC strains was analysed to study the core and accessory genome size evolution in terms of exponential decay and growth models. The models are based on the median values of the conserved and accessory genome HGCs which in turn are obtained from the random permutations of genome comparisons and limiting the number of possible combinations to 100, for each new genome being added. The exponential decay model in Fig 2A suggests that the number of core HGCs tends to approach a plateau near 2,000 HGCs whereas the accessory HGCs tends to reach a plateau near 6,000 HGCs (Fig 2B) for the 96 strains under comparison. Since there is no distinctly sharp plateau formation, we estimate that the MTBC has an open pangenome i.e. the number of distinct genes found in MTBC strains is infinite as opposed to finite number of genes in a closed pangenome.


Comparative whole-genome analysis of clinical isolates reveals characteristic architecture of Mycobacterium tuberculosis pangenome.

Periwal V, Patowary A, Vellarikkal SK, Gupta A, Singh M, Mittal A, Jeyapaul S, Chauhan RK, Singh AV, Singh PK, Garg P, Katoch VM, Katoch K, Chauhan DS, Sivasubbu S, Scaria V - PLoS ONE (2015)

Core and accessory genome size evolution.(A) Each point indicates the number of HGCs conserved in a genome. The red line indicates an exponential decay function based on the median values of core HGCs when each time a new genome is added to the analysis. (B) Accessory genome of MTBC. The MTBC has an open pangenome model.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4390332&req=5

pone.0122979.g002: Core and accessory genome size evolution.(A) Each point indicates the number of HGCs conserved in a genome. The red line indicates an exponential decay function based on the median values of core HGCs when each time a new genome is added to the analysis. (B) Accessory genome of MTBC. The MTBC has an open pangenome model.
Mentions: In view of defining the size of MTBC pangenome the primary question that arises is whether sufficient number of genomes has been sequenced to describe the core and accessory gene content of the species. For this we observed the change in the core and accessory gene component as a function of their size with increasing number of sampled genomes over the entire 96 genomes (Fig 2). The total genome component of the 96 MTBC strains was analysed to study the core and accessory genome size evolution in terms of exponential decay and growth models. The models are based on the median values of the conserved and accessory genome HGCs which in turn are obtained from the random permutations of genome comparisons and limiting the number of possible combinations to 100, for each new genome being added. The exponential decay model in Fig 2A suggests that the number of core HGCs tends to approach a plateau near 2,000 HGCs whereas the accessory HGCs tends to reach a plateau near 6,000 HGCs (Fig 2B) for the 96 strains under comparison. Since there is no distinctly sharp plateau formation, we estimate that the MTBC has an open pangenome i.e. the number of distinct genes found in MTBC strains is infinite as opposed to finite number of genes in a closed pangenome.

Bottom Line: We identified 74 HGCs that were absent from reference strains H37Rv and H37Ra but were present in most of clinical isolates.The pangenome approach is a promising tool for studying strain specific genetic differences occurring within species.We also suggest that since selecting appropriate target genes for typing purposes requires the expected target gene be present in all isolates being typed, therefore estimating the core-component of the species becomes a subject of prime importance.

View Article: PubMed Central - PubMed

Affiliation: GN Ramachandran Knowledge Center for Genome Informatics, CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mall Road, Delhi-110007, India; Academy of Scientific & Innovative Research (AcSIR), 2, Rafi Marg, Anusandhan Bhawan, New Delhi 110001, India.

ABSTRACT
The tubercle complex consists of closely related mycobacterium species which appear to be variants of a single species. Comparative genome analysis of different strains could provide useful clues and insights into the genetic diversity of the species. We integrated genome assemblies of 96 strains from Mycobacterium tuberculosis complex (MTBC), which included 8 Indian clinical isolates sequenced and assembled in this study, to understand its pangenome architecture. We predicted genes for all the 96 strains and clustered their respective CDSs into homologous gene clusters (HGCs) to reveal a hard-core, soft-core and accessory genome component of MTBC. The hard-core (HGCs shared amongst 100% of the strains) was comprised of 2,066 gene clusters whereas the soft-core (HGCs shared amongst at least 95% of the strains) comprised of 3,374 gene clusters. The change in the core and accessory genome components when observed as a function of their size revealed that MTBC has an open pangenome. We identified 74 HGCs that were absent from reference strains H37Rv and H37Ra but were present in most of clinical isolates. We report PCR validation on 9 candidate genes depicting 7 genes completely absent from H37Rv and H37Ra whereas 2 genes shared partial homology with them accounting to probable insertion and deletion events. The pangenome approach is a promising tool for studying strain specific genetic differences occurring within species. We also suggest that since selecting appropriate target genes for typing purposes requires the expected target gene be present in all isolates being typed, therefore estimating the core-component of the species becomes a subject of prime importance.

No MeSH data available.


Related in: MedlinePlus