Limits...
High depth, whole-genome sequencing of cholera isolates from Haiti and the Dominican Republic.

Sealfon R, Gire S, Ellis C, Calderwood S, Qadri F, Hensley L, Kellis M, Ryan ET, LaRocque RC, Harris JB, Sabeti PC - BMC Genomics (2012)

Bottom Line: Using these sequence data, we examined the effect of depth of coverage and sequencing platform on genome assembly and identification of sequence variants.We found that 50x coverage is sufficient to construct a whole-genome assembly and to accurately call most variants from 100 base pair paired-end sequencing reads.Sequence variant analyses of V. cholerae isolates, including multiple isolates from the Haitian outbreak, identify coverage-specific and technology-specific effects on variant detection, and provide insight into genomic change and functional evolution during an epidemic.

View Article: PubMed Central - HTML - PubMed

Affiliation: Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA. rsealfon@mit.edu

ABSTRACT

Background: Whole-genome sequencing is an important tool for understanding microbial evolution and identifying the emergence of functionally important variants over the course of epidemics. In October 2010, a severe cholera epidemic began in Haiti, with additional cases identified in the neighboring Dominican Republic. We used whole-genome approaches to sequence four Vibrio cholerae isolates from Haiti and the Dominican Republic and three additional V. cholerae isolates to a high depth of coverage (>2000x); four of the seven isolates were previously sequenced.

Results: Using these sequence data, we examined the effect of depth of coverage and sequencing platform on genome assembly and identification of sequence variants. We found that 50x coverage is sufficient to construct a whole-genome assembly and to accurately call most variants from 100 base pair paired-end sequencing reads. Phylogenetic analysis between the newly sequenced and thirty-three previously sequenced V. cholerae isolates indicates that the Haitian and Dominican Republic isolates are closest to strains from South Asia. The Haitian and Dominican Republic isolates form a tight cluster, with only four variants unique to individual isolates. These variants are located in the CTX region, the SXT region, and the core genome. Of the 126 mutations identified that separate the Haiti-Dominican Republic cluster from the V. cholerae reference strain (N16961), 73 are non-synonymous changes, and a number of these changes cluster in specific genes and pathways.

Conclusions: Sequence variant analyses of V. cholerae isolates, including multiple isolates from the Haitian outbreak, identify coverage-specific and technology-specific effects on variant detection, and provide insight into genomic change and functional evolution during an epidemic.

Show MeSH

Related in: MedlinePlus

Comparison of SNPs, insertions, and deletions called across sequencing technologies. (A) List of published sequences for the four previously sequenced isolates (N16961, O395, H1, and H2) examined in this study. (B) Comparison of new Illumina sequences to GenBank references. The number of differences identified in the new sequence relative to the GenBank reference is shown in the table, with the number of differences confirmed by alignment to additional strains shown in parentheses. (C) Comparison of Illumina-based and PacBio-based SNP, insertion, and deletion calls relative to the Sanger-sequenced N16961 reference [GenBank:AE003852, GenBank:AE003853]. The number of variants called in PacBio sequencing only (red circle), in Illumina sequencing only (blue circle), or in both (intersection) are shown. For the N16961 sequences, the number of differences confirmed by alignment to additional strains is shown in parentheses. For H1 and H2, only variants that do not correspond to likely errors in the N16961 reference sequence are counted.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3473251&req=5

Figure 2: Comparison of SNPs, insertions, and deletions called across sequencing technologies. (A) List of published sequences for the four previously sequenced isolates (N16961, O395, H1, and H2) examined in this study. (B) Comparison of new Illumina sequences to GenBank references. The number of differences identified in the new sequence relative to the GenBank reference is shown in the table, with the number of differences confirmed by alignment to additional strains shown in parentheses. (C) Comparison of Illumina-based and PacBio-based SNP, insertion, and deletion calls relative to the Sanger-sequenced N16961 reference [GenBank:AE003852, GenBank:AE003853]. The number of variants called in PacBio sequencing only (red circle), in Illumina sequencing only (blue circle), or in both (intersection) are shown. For the N16961 sequences, the number of differences confirmed by alignment to additional strains is shown in parentheses. For H1 and H2, only variants that do not correspond to likely errors in the N16961 reference sequence are counted.

Mentions: The original reference genome for V. cholerae was the Sanger-sequenced N16961 genome [12]. Feng et al. subsequently identified a number of corrections to the reference based on comparisons to additional strains at ambiguous positions and open reading frame clone sequence data [13]. Their corrections included 58 single base pair differences and 63 insertions and deletions. Similarly, we identified 59 single base pair differences as well as 95 insertions and deletions between N16961* and the N16961 reference [12] (Figure 2B).


High depth, whole-genome sequencing of cholera isolates from Haiti and the Dominican Republic.

Sealfon R, Gire S, Ellis C, Calderwood S, Qadri F, Hensley L, Kellis M, Ryan ET, LaRocque RC, Harris JB, Sabeti PC - BMC Genomics (2012)

Comparison of SNPs, insertions, and deletions called across sequencing technologies. (A) List of published sequences for the four previously sequenced isolates (N16961, O395, H1, and H2) examined in this study. (B) Comparison of new Illumina sequences to GenBank references. The number of differences identified in the new sequence relative to the GenBank reference is shown in the table, with the number of differences confirmed by alignment to additional strains shown in parentheses. (C) Comparison of Illumina-based and PacBio-based SNP, insertion, and deletion calls relative to the Sanger-sequenced N16961 reference [GenBank:AE003852, GenBank:AE003853]. The number of variants called in PacBio sequencing only (red circle), in Illumina sequencing only (blue circle), or in both (intersection) are shown. For the N16961 sequences, the number of differences confirmed by alignment to additional strains is shown in parentheses. For H1 and H2, only variants that do not correspond to likely errors in the N16961 reference sequence are counted.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3473251&req=5

Figure 2: Comparison of SNPs, insertions, and deletions called across sequencing technologies. (A) List of published sequences for the four previously sequenced isolates (N16961, O395, H1, and H2) examined in this study. (B) Comparison of new Illumina sequences to GenBank references. The number of differences identified in the new sequence relative to the GenBank reference is shown in the table, with the number of differences confirmed by alignment to additional strains shown in parentheses. (C) Comparison of Illumina-based and PacBio-based SNP, insertion, and deletion calls relative to the Sanger-sequenced N16961 reference [GenBank:AE003852, GenBank:AE003853]. The number of variants called in PacBio sequencing only (red circle), in Illumina sequencing only (blue circle), or in both (intersection) are shown. For the N16961 sequences, the number of differences confirmed by alignment to additional strains is shown in parentheses. For H1 and H2, only variants that do not correspond to likely errors in the N16961 reference sequence are counted.
Mentions: The original reference genome for V. cholerae was the Sanger-sequenced N16961 genome [12]. Feng et al. subsequently identified a number of corrections to the reference based on comparisons to additional strains at ambiguous positions and open reading frame clone sequence data [13]. Their corrections included 58 single base pair differences and 63 insertions and deletions. Similarly, we identified 59 single base pair differences as well as 95 insertions and deletions between N16961* and the N16961 reference [12] (Figure 2B).

Bottom Line: Using these sequence data, we examined the effect of depth of coverage and sequencing platform on genome assembly and identification of sequence variants.We found that 50x coverage is sufficient to construct a whole-genome assembly and to accurately call most variants from 100 base pair paired-end sequencing reads.Sequence variant analyses of V. cholerae isolates, including multiple isolates from the Haitian outbreak, identify coverage-specific and technology-specific effects on variant detection, and provide insight into genomic change and functional evolution during an epidemic.

View Article: PubMed Central - HTML - PubMed

Affiliation: Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA. rsealfon@mit.edu

ABSTRACT

Background: Whole-genome sequencing is an important tool for understanding microbial evolution and identifying the emergence of functionally important variants over the course of epidemics. In October 2010, a severe cholera epidemic began in Haiti, with additional cases identified in the neighboring Dominican Republic. We used whole-genome approaches to sequence four Vibrio cholerae isolates from Haiti and the Dominican Republic and three additional V. cholerae isolates to a high depth of coverage (>2000x); four of the seven isolates were previously sequenced.

Results: Using these sequence data, we examined the effect of depth of coverage and sequencing platform on genome assembly and identification of sequence variants. We found that 50x coverage is sufficient to construct a whole-genome assembly and to accurately call most variants from 100 base pair paired-end sequencing reads. Phylogenetic analysis between the newly sequenced and thirty-three previously sequenced V. cholerae isolates indicates that the Haitian and Dominican Republic isolates are closest to strains from South Asia. The Haitian and Dominican Republic isolates form a tight cluster, with only four variants unique to individual isolates. These variants are located in the CTX region, the SXT region, and the core genome. Of the 126 mutations identified that separate the Haiti-Dominican Republic cluster from the V. cholerae reference strain (N16961), 73 are non-synonymous changes, and a number of these changes cluster in specific genes and pathways.

Conclusions: Sequence variant analyses of V. cholerae isolates, including multiple isolates from the Haitian outbreak, identify coverage-specific and technology-specific effects on variant detection, and provide insight into genomic change and functional evolution during an epidemic.

Show MeSH
Related in: MedlinePlus