Limits...
A platform-independent method for detecting errors in metagenomic sequencing data: DRISEE.

Keegan KP, Trimble WL, Wilkening J, Wilke A, Harrison T, D'Souza M, Meyer F - PLoS Comput. Biol. (2012)

Bottom Line: DRISEE provides positional error estimates that can be used to inform read trimming within a sample.It also provides global (whole sample) error estimates that can be used to identify samples with high or varying levels of sequencing error that may confound downstream analyses, particularly in the case of studies that utilize data from multiple sequencing samples.Here, DRISEE is applied to (non amplicon) data sets from both the 454 and Illumina platforms.

View Article: PubMed Central - PubMed

Affiliation: Argonne National Laboratory, Argonne, Illinois, United States of America. kkeegan@anl.gov

ABSTRACT
We provide a novel method, DRISEE (duplicate read inferred sequencing error estimation), to assess sequencing quality (alternatively referred to as "noise" or "error") within and/or between sequencing samples. DRISEE provides positional error estimates that can be used to inform read trimming within a sample. It also provides global (whole sample) error estimates that can be used to identify samples with high or varying levels of sequencing error that may confound downstream analyses, particularly in the case of studies that utilize data from multiple sequencing samples. For shotgun metagenomic data, we believe that DRISEE provides estimates of sequencing error that are more accurate and less constrained by technical limitations than existing methods that rely on reference genomes or the use of scores (e.g. Phred). Here, DRISEE is applied to (non amplicon) data sets from both the 454 and Illumina platforms. The DRISEE error estimate is obtained by analyzing sets of artifactual duplicate reads (ADRs), a known by-product of both sequencing platforms. We present DRISEE as an open-source, platform-independent method to assess sequencing error in shotgun metagenomic data, and utilize it to discover previously uncharacterized error in de novo sequence data from the 454 and Illumina sequencing platforms.

Show MeSH

Related in: MedlinePlus

Total DRISEE errors of genomic and metagenomic data produced by 454 and Illumina technologies.A boxplot (conventional five number summary) presents the distribution of averaged total DRISEE errors observed among 476 sequencing samples. The average total DRISEE error is plotted on the Y-axis. X-axis labels indicate the technology (454 or Illumina), type of sample (shotgun genomic or shotgun metagenomic), and in parenthesis, number of samples represented by each individual boxplot. Gray highlight indicates the range of values that have been previously reported for error on 454 and Illumina sequencing platforms (0.25–4%).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3369934&req=5

pcbi-1002541-g003: Total DRISEE errors of genomic and metagenomic data produced by 454 and Illumina technologies.A boxplot (conventional five number summary) presents the distribution of averaged total DRISEE errors observed among 476 sequencing samples. The average total DRISEE error is plotted on the Y-axis. X-axis labels indicate the technology (454 or Illumina), type of sample (shotgun genomic or shotgun metagenomic), and in parenthesis, number of samples represented by each individual boxplot. Gray highlight indicates the range of values that have been previously reported for error on 454 and Illumina sequencing platforms (0.25–4%).

Mentions: In further trials, DRISEE was applied to genomic and metagenomic shotgun data produced by two widely utilized sequencing technologies, 454 and Illumina (n = 242 genomic 454, n = 65 metagenomic 454, n = 10 genomic Illumina, and n = 159 metagenomic Illumina samples), 476 samples in all. Less than half of the individual samples (n = 169) exhibit DRISEE-based errors consistent with the reported range of second-generation sequencing errors (0.25–4%) [4], [11], [12], [13], [19], [31]. The majority of samples (n = 307) exhibit DRISEE-based errors that fall outside the range of reported sequencing errors (error<0.25%, n = 73; error>4%, n = 234; avg ± stdev = 12.63±15.12) (Figure 3). The Supplemental Methods (Text S1) include a description as to how data sets were selected.


A platform-independent method for detecting errors in metagenomic sequencing data: DRISEE.

Keegan KP, Trimble WL, Wilkening J, Wilke A, Harrison T, D'Souza M, Meyer F - PLoS Comput. Biol. (2012)

Total DRISEE errors of genomic and metagenomic data produced by 454 and Illumina technologies.A boxplot (conventional five number summary) presents the distribution of averaged total DRISEE errors observed among 476 sequencing samples. The average total DRISEE error is plotted on the Y-axis. X-axis labels indicate the technology (454 or Illumina), type of sample (shotgun genomic or shotgun metagenomic), and in parenthesis, number of samples represented by each individual boxplot. Gray highlight indicates the range of values that have been previously reported for error on 454 and Illumina sequencing platforms (0.25–4%).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3369934&req=5

pcbi-1002541-g003: Total DRISEE errors of genomic and metagenomic data produced by 454 and Illumina technologies.A boxplot (conventional five number summary) presents the distribution of averaged total DRISEE errors observed among 476 sequencing samples. The average total DRISEE error is plotted on the Y-axis. X-axis labels indicate the technology (454 or Illumina), type of sample (shotgun genomic or shotgun metagenomic), and in parenthesis, number of samples represented by each individual boxplot. Gray highlight indicates the range of values that have been previously reported for error on 454 and Illumina sequencing platforms (0.25–4%).
Mentions: In further trials, DRISEE was applied to genomic and metagenomic shotgun data produced by two widely utilized sequencing technologies, 454 and Illumina (n = 242 genomic 454, n = 65 metagenomic 454, n = 10 genomic Illumina, and n = 159 metagenomic Illumina samples), 476 samples in all. Less than half of the individual samples (n = 169) exhibit DRISEE-based errors consistent with the reported range of second-generation sequencing errors (0.25–4%) [4], [11], [12], [13], [19], [31]. The majority of samples (n = 307) exhibit DRISEE-based errors that fall outside the range of reported sequencing errors (error<0.25%, n = 73; error>4%, n = 234; avg ± stdev = 12.63±15.12) (Figure 3). The Supplemental Methods (Text S1) include a description as to how data sets were selected.

Bottom Line: DRISEE provides positional error estimates that can be used to inform read trimming within a sample.It also provides global (whole sample) error estimates that can be used to identify samples with high or varying levels of sequencing error that may confound downstream analyses, particularly in the case of studies that utilize data from multiple sequencing samples.Here, DRISEE is applied to (non amplicon) data sets from both the 454 and Illumina platforms.

View Article: PubMed Central - PubMed

Affiliation: Argonne National Laboratory, Argonne, Illinois, United States of America. kkeegan@anl.gov

ABSTRACT
We provide a novel method, DRISEE (duplicate read inferred sequencing error estimation), to assess sequencing quality (alternatively referred to as "noise" or "error") within and/or between sequencing samples. DRISEE provides positional error estimates that can be used to inform read trimming within a sample. It also provides global (whole sample) error estimates that can be used to identify samples with high or varying levels of sequencing error that may confound downstream analyses, particularly in the case of studies that utilize data from multiple sequencing samples. For shotgun metagenomic data, we believe that DRISEE provides estimates of sequencing error that are more accurate and less constrained by technical limitations than existing methods that rely on reference genomes or the use of scores (e.g. Phred). Here, DRISEE is applied to (non amplicon) data sets from both the 454 and Illumina platforms. The DRISEE error estimate is obtained by analyzing sets of artifactual duplicate reads (ADRs), a known by-product of both sequencing platforms. We present DRISEE as an open-source, platform-independent method to assess sequencing error in shotgun metagenomic data, and utilize it to discover previously uncharacterized error in de novo sequence data from the 454 and Illumina sequencing platforms.

Show MeSH
Related in: MedlinePlus