Limits...
Comparative whole-genome analysis of clinical isolates reveals characteristic architecture of Mycobacterium tuberculosis pangenome.

Periwal V, Patowary A, Vellarikkal SK, Gupta A, Singh M, Mittal A, Jeyapaul S, Chauhan RK, Singh AV, Singh PK, Garg P, Katoch VM, Katoch K, Chauhan DS, Sivasubbu S, Scaria V - PLoS ONE (2015)

Bottom Line: We identified 74 HGCs that were absent from reference strains H37Rv and H37Ra but were present in most of clinical isolates.The pangenome approach is a promising tool for studying strain specific genetic differences occurring within species.We also suggest that since selecting appropriate target genes for typing purposes requires the expected target gene be present in all isolates being typed, therefore estimating the core-component of the species becomes a subject of prime importance.

View Article: PubMed Central - PubMed

Affiliation: GN Ramachandran Knowledge Center for Genome Informatics, CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mall Road, Delhi-110007, India; Academy of Scientific & Innovative Research (AcSIR), 2, Rafi Marg, Anusandhan Bhawan, New Delhi 110001, India.

ABSTRACT
The tubercle complex consists of closely related mycobacterium species which appear to be variants of a single species. Comparative genome analysis of different strains could provide useful clues and insights into the genetic diversity of the species. We integrated genome assemblies of 96 strains from Mycobacterium tuberculosis complex (MTBC), which included 8 Indian clinical isolates sequenced and assembled in this study, to understand its pangenome architecture. We predicted genes for all the 96 strains and clustered their respective CDSs into homologous gene clusters (HGCs) to reveal a hard-core, soft-core and accessory genome component of MTBC. The hard-core (HGCs shared amongst 100% of the strains) was comprised of 2,066 gene clusters whereas the soft-core (HGCs shared amongst at least 95% of the strains) comprised of 3,374 gene clusters. The change in the core and accessory genome components when observed as a function of their size revealed that MTBC has an open pangenome. We identified 74 HGCs that were absent from reference strains H37Rv and H37Ra but were present in most of clinical isolates. We report PCR validation on 9 candidate genes depicting 7 genes completely absent from H37Rv and H37Ra whereas 2 genes shared partial homology with them accounting to probable insertion and deletion events. The pangenome approach is a promising tool for studying strain specific genetic differences occurring within species. We also suggest that since selecting appropriate target genes for typing purposes requires the expected target gene be present in all isolates being typed, therefore estimating the core-component of the species becomes a subject of prime importance.

No MeSH data available.


Related in: MedlinePlus

Summary of sequence annotation statistics from BLAST2GO.Representative sequences from all the 8,099 HGCs were subjected to annotation out of which 47.77% (3,869) sequences were annotated with GO slim terms, 26.63% (2,157) sequences were without any BLAST hits, 23.43% (1,898) sequences had only blast results but didn’t had annotation and 1.65% (134) sequences retrieved mapping results but were without GO slim terms. A small fraction of 0.5% (41) sequences failed to fetch BLAST results.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4390332&req=5

pone.0122979.g003: Summary of sequence annotation statistics from BLAST2GO.Representative sequences from all the 8,099 HGCs were subjected to annotation out of which 47.77% (3,869) sequences were annotated with GO slim terms, 26.63% (2,157) sequences were without any BLAST hits, 23.43% (1,898) sequences had only blast results but didn’t had annotation and 1.65% (134) sequences retrieved mapping results but were without GO slim terms. A small fraction of 0.5% (41) sequences failed to fetch BLAST results.

Mentions: Each of the HGCs has a representative sequence which is the parent sequence of every cluster. All HGCs (i.e. 8,099) were annotated by querying their corresponding representative sequences against BLAST2GO as described in methods section. The overall annotation distribution (Fig 3) obtained from BLAST2GO showed that out of the 8,099 protein sequences, 47.77% (3,869) sequences were fully annotated with GO slim terms. 26.63% (2,157) sequences were without any BLAST hits (i.e. the sequence had absolutely no homology to any of the sequences present in the NCBI databases). Based on the results of BLAST hits obtained, the gene ontology mapping process retrieved GO terms distributed in BLAST matches. 23.43% (1,898) sequences failed to retrieve any GO terms associated with them.


Comparative whole-genome analysis of clinical isolates reveals characteristic architecture of Mycobacterium tuberculosis pangenome.

Periwal V, Patowary A, Vellarikkal SK, Gupta A, Singh M, Mittal A, Jeyapaul S, Chauhan RK, Singh AV, Singh PK, Garg P, Katoch VM, Katoch K, Chauhan DS, Sivasubbu S, Scaria V - PLoS ONE (2015)

Summary of sequence annotation statistics from BLAST2GO.Representative sequences from all the 8,099 HGCs were subjected to annotation out of which 47.77% (3,869) sequences were annotated with GO slim terms, 26.63% (2,157) sequences were without any BLAST hits, 23.43% (1,898) sequences had only blast results but didn’t had annotation and 1.65% (134) sequences retrieved mapping results but were without GO slim terms. A small fraction of 0.5% (41) sequences failed to fetch BLAST results.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4390332&req=5

pone.0122979.g003: Summary of sequence annotation statistics from BLAST2GO.Representative sequences from all the 8,099 HGCs were subjected to annotation out of which 47.77% (3,869) sequences were annotated with GO slim terms, 26.63% (2,157) sequences were without any BLAST hits, 23.43% (1,898) sequences had only blast results but didn’t had annotation and 1.65% (134) sequences retrieved mapping results but were without GO slim terms. A small fraction of 0.5% (41) sequences failed to fetch BLAST results.
Mentions: Each of the HGCs has a representative sequence which is the parent sequence of every cluster. All HGCs (i.e. 8,099) were annotated by querying their corresponding representative sequences against BLAST2GO as described in methods section. The overall annotation distribution (Fig 3) obtained from BLAST2GO showed that out of the 8,099 protein sequences, 47.77% (3,869) sequences were fully annotated with GO slim terms. 26.63% (2,157) sequences were without any BLAST hits (i.e. the sequence had absolutely no homology to any of the sequences present in the NCBI databases). Based on the results of BLAST hits obtained, the gene ontology mapping process retrieved GO terms distributed in BLAST matches. 23.43% (1,898) sequences failed to retrieve any GO terms associated with them.

Bottom Line: We identified 74 HGCs that were absent from reference strains H37Rv and H37Ra but were present in most of clinical isolates.The pangenome approach is a promising tool for studying strain specific genetic differences occurring within species.We also suggest that since selecting appropriate target genes for typing purposes requires the expected target gene be present in all isolates being typed, therefore estimating the core-component of the species becomes a subject of prime importance.

View Article: PubMed Central - PubMed

Affiliation: GN Ramachandran Knowledge Center for Genome Informatics, CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mall Road, Delhi-110007, India; Academy of Scientific & Innovative Research (AcSIR), 2, Rafi Marg, Anusandhan Bhawan, New Delhi 110001, India.

ABSTRACT
The tubercle complex consists of closely related mycobacterium species which appear to be variants of a single species. Comparative genome analysis of different strains could provide useful clues and insights into the genetic diversity of the species. We integrated genome assemblies of 96 strains from Mycobacterium tuberculosis complex (MTBC), which included 8 Indian clinical isolates sequenced and assembled in this study, to understand its pangenome architecture. We predicted genes for all the 96 strains and clustered their respective CDSs into homologous gene clusters (HGCs) to reveal a hard-core, soft-core and accessory genome component of MTBC. The hard-core (HGCs shared amongst 100% of the strains) was comprised of 2,066 gene clusters whereas the soft-core (HGCs shared amongst at least 95% of the strains) comprised of 3,374 gene clusters. The change in the core and accessory genome components when observed as a function of their size revealed that MTBC has an open pangenome. We identified 74 HGCs that were absent from reference strains H37Rv and H37Ra but were present in most of clinical isolates. We report PCR validation on 9 candidate genes depicting 7 genes completely absent from H37Rv and H37Ra whereas 2 genes shared partial homology with them accounting to probable insertion and deletion events. The pangenome approach is a promising tool for studying strain specific genetic differences occurring within species. We also suggest that since selecting appropriate target genes for typing purposes requires the expected target gene be present in all isolates being typed, therefore estimating the core-component of the species becomes a subject of prime importance.

No MeSH data available.


Related in: MedlinePlus