Limits...
Analysis of quality raw data of second generation sequencers with Quality Assessment Software.

Ramos RT, Carneiro AR, Baumbach J, Azevedo V, Schneider MP, Silva A - BMC Res Notes (2011)

Bottom Line: Second generation technologies have advantages over Sanger; however, they have resulted in new challenges for the genome construction process, especially because of the small size of the reads, despite the high degree of coverage.Quality filtering is a fundamental step in the process of constructing genomes, as it reduces the frequency of incorrect alignments that are caused by measuring errors, which can occur during the construction process due to the size of the reads, provoking misassemblies.Application of quality filters to sequence data, using the software Quality Assessment, along with graphing analyses, provided greater precision in the definition of cutoff parameters, which increased the accuracy of genome construction.

View Article: PubMed Central - HTML - PubMed

Affiliation: Instituto de Ciências Biológicas, Universidade Federal do Pará, Belém-PA, Brazil. asilva@ufpa.br.

ABSTRACT

Background: Second generation technologies have advantages over Sanger; however, they have resulted in new challenges for the genome construction process, especially because of the small size of the reads, despite the high degree of coverage. Independent of the program chosen for the construction process, DNA sequences are superimposed, based on identity, to extend the reads, generating contigs; mismatches indicate a lack of homology and are not included. This process improves our confidence in the sequences that are generated.

Findings: We developed Quality Assessment Software, with which one can review graphs showing the distribution of quality values from the sequencing reads. This software allow us to adopt more stringent quality standards for sequence data, based on quality-graph analysis and estimated coverage after applying the quality filter, providing acceptable sequence coverage for genome construction from short reads.

Conclusions: Quality filtering is a fundamental step in the process of constructing genomes, as it reduces the frequency of incorrect alignments that are caused by measuring errors, which can occur during the construction process due to the size of the reads, provoking misassemblies. Application of quality filters to sequence data, using the software Quality Assessment, along with graphing analyses, provided greater precision in the definition of cutoff parameters, which increased the accuracy of genome construction.

No MeSH data available.


Frequencies of quality values for the last base of the reads. Distribution of the Phred quality of the last base of raw data reads of Cp162, B7(F3) and B7(R3) as a plot of the observed Phred quality value (X-axis) against frequency of occurrence. A: Cp162; B: B7 (F3); C:B7 (R3).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3105940&req=5

Figure 3: Frequencies of quality values for the last base of the reads. Distribution of the Phred quality of the last base of raw data reads of Cp162, B7(F3) and B7(R3) as a plot of the observed Phred quality value (X-axis) against frequency of occurrence. A: Cp162; B: B7 (F3); C:B7 (R3).

Mentions: The mean quality of each of the 35 sequence bases from the Cp162 data can be observed in Figure 2a; 17 of these gave a mean quality equal to or greater than phred 20, while the terminal bases of the reads had a mean quality of less than 20 [16]. Figure 3a shows the frequency of the quality values of the 35th base of Cp162, with phred 5 being the most common value, which influences the reduction in mean quality observed for this base in Figure 2a. When a cut off filter of phred 20 was applied, the number of reads was reduced by about 43% (Table 1), resulting in a sequence coverage of 172×. Based on the data in Table 2, application of a filter with phred 23 values would give sequence coverage above 100× and a high degree of accuracy of the reads, which would reduce the possibility of misassemblies [17].


Analysis of quality raw data of second generation sequencers with Quality Assessment Software.

Ramos RT, Carneiro AR, Baumbach J, Azevedo V, Schneider MP, Silva A - BMC Res Notes (2011)

Frequencies of quality values for the last base of the reads. Distribution of the Phred quality of the last base of raw data reads of Cp162, B7(F3) and B7(R3) as a plot of the observed Phred quality value (X-axis) against frequency of occurrence. A: Cp162; B: B7 (F3); C:B7 (R3).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3105940&req=5

Figure 3: Frequencies of quality values for the last base of the reads. Distribution of the Phred quality of the last base of raw data reads of Cp162, B7(F3) and B7(R3) as a plot of the observed Phred quality value (X-axis) against frequency of occurrence. A: Cp162; B: B7 (F3); C:B7 (R3).
Mentions: The mean quality of each of the 35 sequence bases from the Cp162 data can be observed in Figure 2a; 17 of these gave a mean quality equal to or greater than phred 20, while the terminal bases of the reads had a mean quality of less than 20 [16]. Figure 3a shows the frequency of the quality values of the 35th base of Cp162, with phred 5 being the most common value, which influences the reduction in mean quality observed for this base in Figure 2a. When a cut off filter of phred 20 was applied, the number of reads was reduced by about 43% (Table 1), resulting in a sequence coverage of 172×. Based on the data in Table 2, application of a filter with phred 23 values would give sequence coverage above 100× and a high degree of accuracy of the reads, which would reduce the possibility of misassemblies [17].

Bottom Line: Second generation technologies have advantages over Sanger; however, they have resulted in new challenges for the genome construction process, especially because of the small size of the reads, despite the high degree of coverage.Quality filtering is a fundamental step in the process of constructing genomes, as it reduces the frequency of incorrect alignments that are caused by measuring errors, which can occur during the construction process due to the size of the reads, provoking misassemblies.Application of quality filters to sequence data, using the software Quality Assessment, along with graphing analyses, provided greater precision in the definition of cutoff parameters, which increased the accuracy of genome construction.

View Article: PubMed Central - HTML - PubMed

Affiliation: Instituto de Ciências Biológicas, Universidade Federal do Pará, Belém-PA, Brazil. asilva@ufpa.br.

ABSTRACT

Background: Second generation technologies have advantages over Sanger; however, they have resulted in new challenges for the genome construction process, especially because of the small size of the reads, despite the high degree of coverage. Independent of the program chosen for the construction process, DNA sequences are superimposed, based on identity, to extend the reads, generating contigs; mismatches indicate a lack of homology and are not included. This process improves our confidence in the sequences that are generated.

Findings: We developed Quality Assessment Software, with which one can review graphs showing the distribution of quality values from the sequencing reads. This software allow us to adopt more stringent quality standards for sequence data, based on quality-graph analysis and estimated coverage after applying the quality filter, providing acceptable sequence coverage for genome construction from short reads.

Conclusions: Quality filtering is a fundamental step in the process of constructing genomes, as it reduces the frequency of incorrect alignments that are caused by measuring errors, which can occur during the construction process due to the size of the reads, provoking misassemblies. Application of quality filters to sequence data, using the software Quality Assessment, along with graphing analyses, provided greater precision in the definition of cutoff parameters, which increased the accuracy of genome construction.

No MeSH data available.