Limits...
Comparative whole-genome analysis of clinical isolates reveals characteristic architecture of Mycobacterium tuberculosis pangenome.

Periwal V, Patowary A, Vellarikkal SK, Gupta A, Singh M, Mittal A, Jeyapaul S, Chauhan RK, Singh AV, Singh PK, Garg P, Katoch VM, Katoch K, Chauhan DS, Sivasubbu S, Scaria V - PLoS ONE (2015)

Bottom Line: We identified 74 HGCs that were absent from reference strains H37Rv and H37Ra but were present in most of clinical isolates.The pangenome approach is a promising tool for studying strain specific genetic differences occurring within species.We also suggest that since selecting appropriate target genes for typing purposes requires the expected target gene be present in all isolates being typed, therefore estimating the core-component of the species becomes a subject of prime importance.

View Article: PubMed Central - PubMed

Affiliation: GN Ramachandran Knowledge Center for Genome Informatics, CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mall Road, Delhi-110007, India; Academy of Scientific & Innovative Research (AcSIR), 2, Rafi Marg, Anusandhan Bhawan, New Delhi 110001, India.

ABSTRACT
The tubercle complex consists of closely related mycobacterium species which appear to be variants of a single species. Comparative genome analysis of different strains could provide useful clues and insights into the genetic diversity of the species. We integrated genome assemblies of 96 strains from Mycobacterium tuberculosis complex (MTBC), which included 8 Indian clinical isolates sequenced and assembled in this study, to understand its pangenome architecture. We predicted genes for all the 96 strains and clustered their respective CDSs into homologous gene clusters (HGCs) to reveal a hard-core, soft-core and accessory genome component of MTBC. The hard-core (HGCs shared amongst 100% of the strains) was comprised of 2,066 gene clusters whereas the soft-core (HGCs shared amongst at least 95% of the strains) comprised of 3,374 gene clusters. The change in the core and accessory genome components when observed as a function of their size revealed that MTBC has an open pangenome. We identified 74 HGCs that were absent from reference strains H37Rv and H37Ra but were present in most of clinical isolates. We report PCR validation on 9 candidate genes depicting 7 genes completely absent from H37Rv and H37Ra whereas 2 genes shared partial homology with them accounting to probable insertion and deletion events. The pangenome approach is a promising tool for studying strain specific genetic differences occurring within species. We also suggest that since selecting appropriate target genes for typing purposes requires the expected target gene be present in all isolates being typed, therefore estimating the core-component of the species becomes a subject of prime importance.

No MeSH data available.


Related in: MedlinePlus

The accessory genome of MTBC.The flower plots depict the distribution of accessory genome HGCs across different species of MTBC. (A) Flower plot showing number of accessory HGCs present in Mtb (in center) and number of species-specific genes in the leaves. (B) Number of species-specific genes of M. bovis in leaves and total accessory HGCs in center. (C) M. canettii has accessory HGCs in center and species-specific genes in leaves. (D and E) The genomes of M. africanum and M. orygis have four and five species-specific genes respectively (outer circle) and total accessory HGCs in the center.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4390332&req=5

pone.0122979.g005: The accessory genome of MTBC.The flower plots depict the distribution of accessory genome HGCs across different species of MTBC. (A) Flower plot showing number of accessory HGCs present in Mtb (in center) and number of species-specific genes in the leaves. (B) Number of species-specific genes of M. bovis in leaves and total accessory HGCs in center. (C) M. canettii has accessory HGCs in center and species-specific genes in leaves. (D and E) The genomes of M. africanum and M. orygis have four and five species-specific genes respectively (outer circle) and total accessory HGCs in the center.

Mentions: The soft-accessory genome of MTBC comprises of 4,725 HGCs. The 4,725 clusters were investigated to identify clusters shared among any given strain pair, unique to each strain and clusters with proteins present in greater than 1/3rd of the clinical isolates but absent in the two laboratory strains i.e. Mtb H37Rv and Mtb H37Ra. A distribution of the accessory genome component in different species of MTBC is depicted in Fig 5. Single strain of M. africanum and M. orygis are present in 420 and 429 HGCs respectively, 70 isolates of M. tuberculosis are variably spread over 3,556 HGCs, 15 strains of M. bovis are present in 1,632 HGCs and 9 isolates of M. canettii are spread over 1,641 HGCs. The numbers in the center of flower plots and circular plots (Fig 5) are overlaps amongst species and are not unique to the species. The number of genes unique to a particular strain is represented in the leaves of the flower plots and in outer circles in case of M. africanum and M. orygis. It was found that most of the genomes with a status of ‘complete sequence’ had a very low number of unique genes as compared to draft quality genomes which had higher number of unique genes.


Comparative whole-genome analysis of clinical isolates reveals characteristic architecture of Mycobacterium tuberculosis pangenome.

Periwal V, Patowary A, Vellarikkal SK, Gupta A, Singh M, Mittal A, Jeyapaul S, Chauhan RK, Singh AV, Singh PK, Garg P, Katoch VM, Katoch K, Chauhan DS, Sivasubbu S, Scaria V - PLoS ONE (2015)

The accessory genome of MTBC.The flower plots depict the distribution of accessory genome HGCs across different species of MTBC. (A) Flower plot showing number of accessory HGCs present in Mtb (in center) and number of species-specific genes in the leaves. (B) Number of species-specific genes of M. bovis in leaves and total accessory HGCs in center. (C) M. canettii has accessory HGCs in center and species-specific genes in leaves. (D and E) The genomes of M. africanum and M. orygis have four and five species-specific genes respectively (outer circle) and total accessory HGCs in the center.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4390332&req=5

pone.0122979.g005: The accessory genome of MTBC.The flower plots depict the distribution of accessory genome HGCs across different species of MTBC. (A) Flower plot showing number of accessory HGCs present in Mtb (in center) and number of species-specific genes in the leaves. (B) Number of species-specific genes of M. bovis in leaves and total accessory HGCs in center. (C) M. canettii has accessory HGCs in center and species-specific genes in leaves. (D and E) The genomes of M. africanum and M. orygis have four and five species-specific genes respectively (outer circle) and total accessory HGCs in the center.
Mentions: The soft-accessory genome of MTBC comprises of 4,725 HGCs. The 4,725 clusters were investigated to identify clusters shared among any given strain pair, unique to each strain and clusters with proteins present in greater than 1/3rd of the clinical isolates but absent in the two laboratory strains i.e. Mtb H37Rv and Mtb H37Ra. A distribution of the accessory genome component in different species of MTBC is depicted in Fig 5. Single strain of M. africanum and M. orygis are present in 420 and 429 HGCs respectively, 70 isolates of M. tuberculosis are variably spread over 3,556 HGCs, 15 strains of M. bovis are present in 1,632 HGCs and 9 isolates of M. canettii are spread over 1,641 HGCs. The numbers in the center of flower plots and circular plots (Fig 5) are overlaps amongst species and are not unique to the species. The number of genes unique to a particular strain is represented in the leaves of the flower plots and in outer circles in case of M. africanum and M. orygis. It was found that most of the genomes with a status of ‘complete sequence’ had a very low number of unique genes as compared to draft quality genomes which had higher number of unique genes.

Bottom Line: We identified 74 HGCs that were absent from reference strains H37Rv and H37Ra but were present in most of clinical isolates.The pangenome approach is a promising tool for studying strain specific genetic differences occurring within species.We also suggest that since selecting appropriate target genes for typing purposes requires the expected target gene be present in all isolates being typed, therefore estimating the core-component of the species becomes a subject of prime importance.

View Article: PubMed Central - PubMed

Affiliation: GN Ramachandran Knowledge Center for Genome Informatics, CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mall Road, Delhi-110007, India; Academy of Scientific & Innovative Research (AcSIR), 2, Rafi Marg, Anusandhan Bhawan, New Delhi 110001, India.

ABSTRACT
The tubercle complex consists of closely related mycobacterium species which appear to be variants of a single species. Comparative genome analysis of different strains could provide useful clues and insights into the genetic diversity of the species. We integrated genome assemblies of 96 strains from Mycobacterium tuberculosis complex (MTBC), which included 8 Indian clinical isolates sequenced and assembled in this study, to understand its pangenome architecture. We predicted genes for all the 96 strains and clustered their respective CDSs into homologous gene clusters (HGCs) to reveal a hard-core, soft-core and accessory genome component of MTBC. The hard-core (HGCs shared amongst 100% of the strains) was comprised of 2,066 gene clusters whereas the soft-core (HGCs shared amongst at least 95% of the strains) comprised of 3,374 gene clusters. The change in the core and accessory genome components when observed as a function of their size revealed that MTBC has an open pangenome. We identified 74 HGCs that were absent from reference strains H37Rv and H37Ra but were present in most of clinical isolates. We report PCR validation on 9 candidate genes depicting 7 genes completely absent from H37Rv and H37Ra whereas 2 genes shared partial homology with them accounting to probable insertion and deletion events. The pangenome approach is a promising tool for studying strain specific genetic differences occurring within species. We also suggest that since selecting appropriate target genes for typing purposes requires the expected target gene be present in all isolates being typed, therefore estimating the core-component of the species becomes a subject of prime importance.

No MeSH data available.


Related in: MedlinePlus