Limits...
Analysis of quality raw data of second generation sequencers with Quality Assessment Software.

Ramos RT, Carneiro AR, Baumbach J, Azevedo V, Schneider MP, Silva A - BMC Res Notes (2011)

Bottom Line: Second generation technologies have advantages over Sanger; however, they have resulted in new challenges for the genome construction process, especially because of the small size of the reads, despite the high degree of coverage.Quality filtering is a fundamental step in the process of constructing genomes, as it reduces the frequency of incorrect alignments that are caused by measuring errors, which can occur during the construction process due to the size of the reads, provoking misassemblies.Application of quality filters to sequence data, using the software Quality Assessment, along with graphing analyses, provided greater precision in the definition of cutoff parameters, which increased the accuracy of genome construction.

View Article: PubMed Central - HTML - PubMed

Affiliation: Instituto de Ciências Biológicas, Universidade Federal do Pará, Belém-PA, Brazil. asilva@ufpa.br.

ABSTRACT

Background: Second generation technologies have advantages over Sanger; however, they have resulted in new challenges for the genome construction process, especially because of the small size of the reads, despite the high degree of coverage. Independent of the program chosen for the construction process, DNA sequences are superimposed, based on identity, to extend the reads, generating contigs; mismatches indicate a lack of homology and are not included. This process improves our confidence in the sequences that are generated.

Findings: We developed Quality Assessment Software, with which one can review graphs showing the distribution of quality values from the sequencing reads. This software allow us to adopt more stringent quality standards for sequence data, based on quality-graph analysis and estimated coverage after applying the quality filter, providing acceptable sequence coverage for genome construction from short reads.

Conclusions: Quality filtering is a fundamental step in the process of constructing genomes, as it reduces the frequency of incorrect alignments that are caused by measuring errors, which can occur during the construction process due to the size of the reads, provoking misassemblies. Application of quality filters to sequence data, using the software Quality Assessment, along with graphing analyses, provided greater precision in the definition of cutoff parameters, which increased the accuracy of genome construction.

No MeSH data available.


Main screen of the Quality Assessment software. The quality input file and the size of the expected reads must be defined to start the data process. After it, the quality graphs can be generated using the specific buttons.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3105940&req=5

Figure 1: Main screen of the Quality Assessment software. The quality input file and the size of the expected reads must be defined to start the data process. After it, the quality graphs can be generated using the specific buttons.

Mentions: The software was developed in JAVA programming language http://java.sun.com/, using the paradigm of object orientation and the graph library Swing http://java.sun.com/docs/books/tutorial/uiswing. Input is raw files from the sequencing machine (multifasta format): (i) files containing the quality values of phred for the readings [14] and (ii) sequences in color space [15] or nucleotide format; this information is solicited only at the time that the quality filter is applied to the data. The software (Figure 1) offers an option in which the size of the expected reads is informed, and when processing is finished it generates a log that shows the multifasta-file-sequence formatting problems: invalid characters, blank lines, and reads that are not of the expected size; these are eliminated in the processing. Optionally, the software can be run without the graphing interface; however, in this option it is not possible to estimate the coverage of the genome, and the phred quality values need to be previously defined.


Analysis of quality raw data of second generation sequencers with Quality Assessment Software.

Ramos RT, Carneiro AR, Baumbach J, Azevedo V, Schneider MP, Silva A - BMC Res Notes (2011)

Main screen of the Quality Assessment software. The quality input file and the size of the expected reads must be defined to start the data process. After it, the quality graphs can be generated using the specific buttons.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3105940&req=5

Figure 1: Main screen of the Quality Assessment software. The quality input file and the size of the expected reads must be defined to start the data process. After it, the quality graphs can be generated using the specific buttons.
Mentions: The software was developed in JAVA programming language http://java.sun.com/, using the paradigm of object orientation and the graph library Swing http://java.sun.com/docs/books/tutorial/uiswing. Input is raw files from the sequencing machine (multifasta format): (i) files containing the quality values of phred for the readings [14] and (ii) sequences in color space [15] or nucleotide format; this information is solicited only at the time that the quality filter is applied to the data. The software (Figure 1) offers an option in which the size of the expected reads is informed, and when processing is finished it generates a log that shows the multifasta-file-sequence formatting problems: invalid characters, blank lines, and reads that are not of the expected size; these are eliminated in the processing. Optionally, the software can be run without the graphing interface; however, in this option it is not possible to estimate the coverage of the genome, and the phred quality values need to be previously defined.

Bottom Line: Second generation technologies have advantages over Sanger; however, they have resulted in new challenges for the genome construction process, especially because of the small size of the reads, despite the high degree of coverage.Quality filtering is a fundamental step in the process of constructing genomes, as it reduces the frequency of incorrect alignments that are caused by measuring errors, which can occur during the construction process due to the size of the reads, provoking misassemblies.Application of quality filters to sequence data, using the software Quality Assessment, along with graphing analyses, provided greater precision in the definition of cutoff parameters, which increased the accuracy of genome construction.

View Article: PubMed Central - HTML - PubMed

Affiliation: Instituto de Ciências Biológicas, Universidade Federal do Pará, Belém-PA, Brazil. asilva@ufpa.br.

ABSTRACT

Background: Second generation technologies have advantages over Sanger; however, they have resulted in new challenges for the genome construction process, especially because of the small size of the reads, despite the high degree of coverage. Independent of the program chosen for the construction process, DNA sequences are superimposed, based on identity, to extend the reads, generating contigs; mismatches indicate a lack of homology and are not included. This process improves our confidence in the sequences that are generated.

Findings: We developed Quality Assessment Software, with which one can review graphs showing the distribution of quality values from the sequencing reads. This software allow us to adopt more stringent quality standards for sequence data, based on quality-graph analysis and estimated coverage after applying the quality filter, providing acceptable sequence coverage for genome construction from short reads.

Conclusions: Quality filtering is a fundamental step in the process of constructing genomes, as it reduces the frequency of incorrect alignments that are caused by measuring errors, which can occur during the construction process due to the size of the reads, provoking misassemblies. Application of quality filters to sequence data, using the software Quality Assessment, along with graphing analyses, provided greater precision in the definition of cutoff parameters, which increased the accuracy of genome construction.

No MeSH data available.