Limits...
Analysis of quality raw data of second generation sequencers with Quality Assessment Software.

Ramos RT, Carneiro AR, Baumbach J, Azevedo V, Schneider MP, Silva A - BMC Res Notes (2011)

Bottom Line: Second generation technologies have advantages over Sanger; however, they have resulted in new challenges for the genome construction process, especially because of the small size of the reads, despite the high degree of coverage.Quality filtering is a fundamental step in the process of constructing genomes, as it reduces the frequency of incorrect alignments that are caused by measuring errors, which can occur during the construction process due to the size of the reads, provoking misassemblies.Application of quality filters to sequence data, using the software Quality Assessment, along with graphing analyses, provided greater precision in the definition of cutoff parameters, which increased the accuracy of genome construction.

View Article: PubMed Central - HTML - PubMed

Affiliation: Instituto de Ciências Biológicas, Universidade Federal do Pará, Belém-PA, Brazil. asilva@ufpa.br.

ABSTRACT

Background: Second generation technologies have advantages over Sanger; however, they have resulted in new challenges for the genome construction process, especially because of the small size of the reads, despite the high degree of coverage. Independent of the program chosen for the construction process, DNA sequences are superimposed, based on identity, to extend the reads, generating contigs; mismatches indicate a lack of homology and are not included. This process improves our confidence in the sequences that are generated.

Findings: We developed Quality Assessment Software, with which one can review graphs showing the distribution of quality values from the sequencing reads. This software allow us to adopt more stringent quality standards for sequence data, based on quality-graph analysis and estimated coverage after applying the quality filter, providing acceptable sequence coverage for genome construction from short reads.

Conclusions: Quality filtering is a fundamental step in the process of constructing genomes, as it reduces the frequency of incorrect alignments that are caused by measuring errors, which can occur during the construction process due to the size of the reads, provoking misassemblies. Application of quality filters to sequence data, using the software Quality Assessment, along with graphing analyses, provided greater precision in the definition of cutoff parameters, which increased the accuracy of genome construction.

No MeSH data available.


Fasta sequences with phred quality values. Example of two Fasta format sequences containing Phred quality values of 25 bp-long reads. A - Fasta sequence with a median and mean Phred qualities of 23 and 19, respectively. B - Fasta sequence with a median and mean Phred qualities of 19 and 22, respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3105940&req=5

Figure 4: Fasta sequences with phred quality values. Example of two Fasta format sequences containing Phred quality values of 25 bp-long reads. A - Fasta sequence with a median and mean Phred qualities of 23 and 19, respectively. B - Fasta sequence with a median and mean Phred qualities of 19 and 22, respectively.

Mentions: In Figure 4a, we can see a reading with a median value of 23 and a mean of 19; consequently, if we apply a quality filter with phred 20 based on the median, the read would be considered as having six bases with quality below 10. In Figure 4b, if the same filter were applied, the read would be discarded, even though it has a mean quality value of 22.


Analysis of quality raw data of second generation sequencers with Quality Assessment Software.

Ramos RT, Carneiro AR, Baumbach J, Azevedo V, Schneider MP, Silva A - BMC Res Notes (2011)

Fasta sequences with phred quality values. Example of two Fasta format sequences containing Phred quality values of 25 bp-long reads. A - Fasta sequence with a median and mean Phred qualities of 23 and 19, respectively. B - Fasta sequence with a median and mean Phred qualities of 19 and 22, respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3105940&req=5

Figure 4: Fasta sequences with phred quality values. Example of two Fasta format sequences containing Phred quality values of 25 bp-long reads. A - Fasta sequence with a median and mean Phred qualities of 23 and 19, respectively. B - Fasta sequence with a median and mean Phred qualities of 19 and 22, respectively.
Mentions: In Figure 4a, we can see a reading with a median value of 23 and a mean of 19; consequently, if we apply a quality filter with phred 20 based on the median, the read would be considered as having six bases with quality below 10. In Figure 4b, if the same filter were applied, the read would be discarded, even though it has a mean quality value of 22.

Bottom Line: Second generation technologies have advantages over Sanger; however, they have resulted in new challenges for the genome construction process, especially because of the small size of the reads, despite the high degree of coverage.Quality filtering is a fundamental step in the process of constructing genomes, as it reduces the frequency of incorrect alignments that are caused by measuring errors, which can occur during the construction process due to the size of the reads, provoking misassemblies.Application of quality filters to sequence data, using the software Quality Assessment, along with graphing analyses, provided greater precision in the definition of cutoff parameters, which increased the accuracy of genome construction.

View Article: PubMed Central - HTML - PubMed

Affiliation: Instituto de Ciências Biológicas, Universidade Federal do Pará, Belém-PA, Brazil. asilva@ufpa.br.

ABSTRACT

Background: Second generation technologies have advantages over Sanger; however, they have resulted in new challenges for the genome construction process, especially because of the small size of the reads, despite the high degree of coverage. Independent of the program chosen for the construction process, DNA sequences are superimposed, based on identity, to extend the reads, generating contigs; mismatches indicate a lack of homology and are not included. This process improves our confidence in the sequences that are generated.

Findings: We developed Quality Assessment Software, with which one can review graphs showing the distribution of quality values from the sequencing reads. This software allow us to adopt more stringent quality standards for sequence data, based on quality-graph analysis and estimated coverage after applying the quality filter, providing acceptable sequence coverage for genome construction from short reads.

Conclusions: Quality filtering is a fundamental step in the process of constructing genomes, as it reduces the frequency of incorrect alignments that are caused by measuring errors, which can occur during the construction process due to the size of the reads, provoking misassemblies. Application of quality filters to sequence data, using the software Quality Assessment, along with graphing analyses, provided greater precision in the definition of cutoff parameters, which increased the accuracy of genome construction.

No MeSH data available.