Limits...
High depth, whole-genome sequencing of cholera isolates from Haiti and the Dominican Republic.

Sealfon R, Gire S, Ellis C, Calderwood S, Qadri F, Hensley L, Kellis M, Ryan ET, LaRocque RC, Harris JB, Sabeti PC - BMC Genomics (2012)

Bottom Line: Using these sequence data, we examined the effect of depth of coverage and sequencing platform on genome assembly and identification of sequence variants.We found that 50x coverage is sufficient to construct a whole-genome assembly and to accurately call most variants from 100 base pair paired-end sequencing reads.Sequence variant analyses of V. cholerae isolates, including multiple isolates from the Haitian outbreak, identify coverage-specific and technology-specific effects on variant detection, and provide insight into genomic change and functional evolution during an epidemic.

View Article: PubMed Central - HTML - PubMed

Affiliation: Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA. rsealfon@mit.edu

ABSTRACT

Background: Whole-genome sequencing is an important tool for understanding microbial evolution and identifying the emergence of functionally important variants over the course of epidemics. In October 2010, a severe cholera epidemic began in Haiti, with additional cases identified in the neighboring Dominican Republic. We used whole-genome approaches to sequence four Vibrio cholerae isolates from Haiti and the Dominican Republic and three additional V. cholerae isolates to a high depth of coverage (>2000x); four of the seven isolates were previously sequenced.

Results: Using these sequence data, we examined the effect of depth of coverage and sequencing platform on genome assembly and identification of sequence variants. We found that 50x coverage is sufficient to construct a whole-genome assembly and to accurately call most variants from 100 base pair paired-end sequencing reads. Phylogenetic analysis between the newly sequenced and thirty-three previously sequenced V. cholerae isolates indicates that the Haitian and Dominican Republic isolates are closest to strains from South Asia. The Haitian and Dominican Republic isolates form a tight cluster, with only four variants unique to individual isolates. These variants are located in the CTX region, the SXT region, and the core genome. Of the 126 mutations identified that separate the Haiti-Dominican Republic cluster from the V. cholerae reference strain (N16961), 73 are non-synonymous changes, and a number of these changes cluster in specific genes and pathways.

Conclusions: Sequence variant analyses of V. cholerae isolates, including multiple isolates from the Haitian outbreak, identify coverage-specific and technology-specific effects on variant detection, and provide insight into genomic change and functional evolution during an epidemic.

Show MeSH

Related in: MedlinePlus

Variation in depth of coverage of the sequenced isolates, based on read alignments of the seven sequenced strains against the N16961 reference genome. Chromosome 1 (A) and chromosome 2 (B) are shown. The depth of coverage of 1000 base pair windows of 150x average coverage subsamples of the DR1 (outermost circle), H1*, H2*, H3, N16961*, O395*, and DB_2002 (innermost circle) isolates is displayed. Regions at low depth of coverage (<12x) are shown in red, while regions at high depth of coverage (>240x) are shown in blue. The depth of coverage in each window is displayed using the Circos tool [34]. Genomic islands as defined in [15] and the superintegron region as defined in [8] are shown.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3473251&req=5

Figure 4: Variation in depth of coverage of the sequenced isolates, based on read alignments of the seven sequenced strains against the N16961 reference genome. Chromosome 1 (A) and chromosome 2 (B) are shown. The depth of coverage of 1000 base pair windows of 150x average coverage subsamples of the DR1 (outermost circle), H1*, H2*, H3, N16961*, O395*, and DB_2002 (innermost circle) isolates is displayed. Regions at low depth of coverage (<12x) are shown in red, while regions at high depth of coverage (>240x) are shown in blue. The depth of coverage in each window is displayed using the Circos tool [34]. Genomic islands as defined in [15] and the superintegron region as defined in [8] are shown.

Mentions: To further characterize the Haitian and Dominican Republic isolates, we identified deletions and copy number variation relative to reference sequences (Figure 4). In all Haitian and Dominican Republic isolates, deletions were observed in the VSP-2 and superintegron regions. There are also deletions in the SXT region of the Haitian and Dominican Republic isolates relative to the MJ-1236 reference strain from Bangladesh (Additional file 6: Figure S6). To identify novel insertions, we aligned a 150x-coverage sample of N16961* reads to the de novo assembly of each Dominican Republic and Haitian isolate. All 1000-base pair windows in the de novo assemblies of the Haitian and Dominican Republic isolates to which N16961* reads did not map matched SXT integrating conjugative element sequences in GenBank, suggesting that no additional large insertions are present in the genomes of these isolates.


High depth, whole-genome sequencing of cholera isolates from Haiti and the Dominican Republic.

Sealfon R, Gire S, Ellis C, Calderwood S, Qadri F, Hensley L, Kellis M, Ryan ET, LaRocque RC, Harris JB, Sabeti PC - BMC Genomics (2012)

Variation in depth of coverage of the sequenced isolates, based on read alignments of the seven sequenced strains against the N16961 reference genome. Chromosome 1 (A) and chromosome 2 (B) are shown. The depth of coverage of 1000 base pair windows of 150x average coverage subsamples of the DR1 (outermost circle), H1*, H2*, H3, N16961*, O395*, and DB_2002 (innermost circle) isolates is displayed. Regions at low depth of coverage (<12x) are shown in red, while regions at high depth of coverage (>240x) are shown in blue. The depth of coverage in each window is displayed using the Circos tool [34]. Genomic islands as defined in [15] and the superintegron region as defined in [8] are shown.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3473251&req=5

Figure 4: Variation in depth of coverage of the sequenced isolates, based on read alignments of the seven sequenced strains against the N16961 reference genome. Chromosome 1 (A) and chromosome 2 (B) are shown. The depth of coverage of 1000 base pair windows of 150x average coverage subsamples of the DR1 (outermost circle), H1*, H2*, H3, N16961*, O395*, and DB_2002 (innermost circle) isolates is displayed. Regions at low depth of coverage (<12x) are shown in red, while regions at high depth of coverage (>240x) are shown in blue. The depth of coverage in each window is displayed using the Circos tool [34]. Genomic islands as defined in [15] and the superintegron region as defined in [8] are shown.
Mentions: To further characterize the Haitian and Dominican Republic isolates, we identified deletions and copy number variation relative to reference sequences (Figure 4). In all Haitian and Dominican Republic isolates, deletions were observed in the VSP-2 and superintegron regions. There are also deletions in the SXT region of the Haitian and Dominican Republic isolates relative to the MJ-1236 reference strain from Bangladesh (Additional file 6: Figure S6). To identify novel insertions, we aligned a 150x-coverage sample of N16961* reads to the de novo assembly of each Dominican Republic and Haitian isolate. All 1000-base pair windows in the de novo assemblies of the Haitian and Dominican Republic isolates to which N16961* reads did not map matched SXT integrating conjugative element sequences in GenBank, suggesting that no additional large insertions are present in the genomes of these isolates.

Bottom Line: Using these sequence data, we examined the effect of depth of coverage and sequencing platform on genome assembly and identification of sequence variants.We found that 50x coverage is sufficient to construct a whole-genome assembly and to accurately call most variants from 100 base pair paired-end sequencing reads.Sequence variant analyses of V. cholerae isolates, including multiple isolates from the Haitian outbreak, identify coverage-specific and technology-specific effects on variant detection, and provide insight into genomic change and functional evolution during an epidemic.

View Article: PubMed Central - HTML - PubMed

Affiliation: Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA. rsealfon@mit.edu

ABSTRACT

Background: Whole-genome sequencing is an important tool for understanding microbial evolution and identifying the emergence of functionally important variants over the course of epidemics. In October 2010, a severe cholera epidemic began in Haiti, with additional cases identified in the neighboring Dominican Republic. We used whole-genome approaches to sequence four Vibrio cholerae isolates from Haiti and the Dominican Republic and three additional V. cholerae isolates to a high depth of coverage (>2000x); four of the seven isolates were previously sequenced.

Results: Using these sequence data, we examined the effect of depth of coverage and sequencing platform on genome assembly and identification of sequence variants. We found that 50x coverage is sufficient to construct a whole-genome assembly and to accurately call most variants from 100 base pair paired-end sequencing reads. Phylogenetic analysis between the newly sequenced and thirty-three previously sequenced V. cholerae isolates indicates that the Haitian and Dominican Republic isolates are closest to strains from South Asia. The Haitian and Dominican Republic isolates form a tight cluster, with only four variants unique to individual isolates. These variants are located in the CTX region, the SXT region, and the core genome. Of the 126 mutations identified that separate the Haiti-Dominican Republic cluster from the V. cholerae reference strain (N16961), 73 are non-synonymous changes, and a number of these changes cluster in specific genes and pathways.

Conclusions: Sequence variant analyses of V. cholerae isolates, including multiple isolates from the Haitian outbreak, identify coverage-specific and technology-specific effects on variant detection, and provide insight into genomic change and functional evolution during an epidemic.

Show MeSH
Related in: MedlinePlus