Limits...
Re-alignment of the unmapped reads with base quality score.

Peng X, Wang J, Zhang Z, Xiao Q, Li M, Pan Y - BMC Bioinformatics (2015)

Bottom Line: The results show that many reads which fail to be aligned by the most popular alignment tools (BWA and Bowtie2) can be correctly re-aligned by RAUR, with a similar Precision.Even compared with the BWA-MEM and the local mode of Bowtie2, which perform local alignment for long reads to improve the alignment rate, RAUR also shows advantages on the Alignment rate and Precision in some cases.Therefore, the trimming strategy used in RAUR is useful to improve the Alignment rate of alignment tools for the next-generation genome sequencing.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Motivation: Based on the next generation genome sequencing technologies, a variety of biological applications are developed, while alignment is the first step once the sequencing reads are obtained. In recent years, many software tools have been developed to efficiently and accurately align short reads to the reference genome. However, there are still many reads that can't be mapped to the reference genome, due to the exceeding of allowable mismatches. Moreover, besides the unmapped reads, the reads with low mapping qualities are also excluded from the downstream analysis, such as variance calling. If we can take advantages of the confident segments of these reads, not only can the alignment rates be improved, but also more information will be provided for the downstream analysis.

Results: This paper proposes a method, called RAUR (Re-align the Unmapped Reads), to re-align the reads that can not be mapped by alignment tools. Firstly, it takes advantages of the base quality scores (reported by the sequencer) to figure out the most confident and informative segments of the unmapped reads by controlling the number of possible mismatches in the alignment. Then, combined with an alignment tool, RAUR re-align these segments of the reads. We run RAUR on both simulated data and real data with different read lengths. The results show that many reads which fail to be aligned by the most popular alignment tools (BWA and Bowtie2) can be correctly re-aligned by RAUR, with a similar Precision. Even compared with the BWA-MEM and the local mode of Bowtie2, which perform local alignment for long reads to improve the alignment rate, RAUR also shows advantages on the Alignment rate and Precision in some cases. Therefore, the trimming strategy used in RAUR is useful to improve the Alignment rate of alignment tools for the next-generation genome sequencing.

Availability: All source code are available at http://netlab.csu.edu.cn/bioinformatics/RAUR.html.

Show MeSH
An example of trimming. The consecutive squares represent the bases of a read with 45 bp, where the black color squares denote the bases with low quality scores, and in contrast the white color squares are the bases with high quality scores. There are eight bases with low quality scores in the read. When K = 4, the read is trimmed into a longest segment which contains four low quality bases, and when K = 3, the read is trimmed into a longest segment which contains three low quality bases.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4402702&req=5

Figure 3: An example of trimming. The consecutive squares represent the bases of a read with 45 bp, where the black color squares denote the bases with low quality scores, and in contrast the white color squares are the bases with high quality scores. There are eight bases with low quality scores in the read. When K = 4, the read is trimmed into a longest segment which contains four low quality bases, and when K = 3, the read is trimmed into a longest segment which contains three low quality bases.

Mentions: Figure 3 shows an example of trimming a read. The consecutive squares represent the bases of a read with 45 bp, where the black color squares denote the bases with low quality scores, and in contrast the white color squares are the bases with high quality scores. There are eight bases with low quality scores in the read. When K = 4, the longest segment of the read starts at position 14 of the original read, and ends at position 42, containing four low quality bases. When the trimmed read can't be aligned, K is decreased by 1, and the trimming algorithm search for the longest segment containing three low quality score bases. The start position of the longest segment is 7, and end position is 29. The trimming will stop when the read can be confidently mapped or K = 0. In our experiments, the initial value of K is set as 8.


Re-alignment of the unmapped reads with base quality score.

Peng X, Wang J, Zhang Z, Xiao Q, Li M, Pan Y - BMC Bioinformatics (2015)

An example of trimming. The consecutive squares represent the bases of a read with 45 bp, where the black color squares denote the bases with low quality scores, and in contrast the white color squares are the bases with high quality scores. There are eight bases with low quality scores in the read. When K = 4, the read is trimmed into a longest segment which contains four low quality bases, and when K = 3, the read is trimmed into a longest segment which contains three low quality bases.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4402702&req=5

Figure 3: An example of trimming. The consecutive squares represent the bases of a read with 45 bp, where the black color squares denote the bases with low quality scores, and in contrast the white color squares are the bases with high quality scores. There are eight bases with low quality scores in the read. When K = 4, the read is trimmed into a longest segment which contains four low quality bases, and when K = 3, the read is trimmed into a longest segment which contains three low quality bases.
Mentions: Figure 3 shows an example of trimming a read. The consecutive squares represent the bases of a read with 45 bp, where the black color squares denote the bases with low quality scores, and in contrast the white color squares are the bases with high quality scores. There are eight bases with low quality scores in the read. When K = 4, the longest segment of the read starts at position 14 of the original read, and ends at position 42, containing four low quality bases. When the trimmed read can't be aligned, K is decreased by 1, and the trimming algorithm search for the longest segment containing three low quality score bases. The start position of the longest segment is 7, and end position is 29. The trimming will stop when the read can be confidently mapped or K = 0. In our experiments, the initial value of K is set as 8.

Bottom Line: The results show that many reads which fail to be aligned by the most popular alignment tools (BWA and Bowtie2) can be correctly re-aligned by RAUR, with a similar Precision.Even compared with the BWA-MEM and the local mode of Bowtie2, which perform local alignment for long reads to improve the alignment rate, RAUR also shows advantages on the Alignment rate and Precision in some cases.Therefore, the trimming strategy used in RAUR is useful to improve the Alignment rate of alignment tools for the next-generation genome sequencing.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Motivation: Based on the next generation genome sequencing technologies, a variety of biological applications are developed, while alignment is the first step once the sequencing reads are obtained. In recent years, many software tools have been developed to efficiently and accurately align short reads to the reference genome. However, there are still many reads that can't be mapped to the reference genome, due to the exceeding of allowable mismatches. Moreover, besides the unmapped reads, the reads with low mapping qualities are also excluded from the downstream analysis, such as variance calling. If we can take advantages of the confident segments of these reads, not only can the alignment rates be improved, but also more information will be provided for the downstream analysis.

Results: This paper proposes a method, called RAUR (Re-align the Unmapped Reads), to re-align the reads that can not be mapped by alignment tools. Firstly, it takes advantages of the base quality scores (reported by the sequencer) to figure out the most confident and informative segments of the unmapped reads by controlling the number of possible mismatches in the alignment. Then, combined with an alignment tool, RAUR re-align these segments of the reads. We run RAUR on both simulated data and real data with different read lengths. The results show that many reads which fail to be aligned by the most popular alignment tools (BWA and Bowtie2) can be correctly re-aligned by RAUR, with a similar Precision. Even compared with the BWA-MEM and the local mode of Bowtie2, which perform local alignment for long reads to improve the alignment rate, RAUR also shows advantages on the Alignment rate and Precision in some cases. Therefore, the trimming strategy used in RAUR is useful to improve the Alignment rate of alignment tools for the next-generation genome sequencing.

Availability: All source code are available at http://netlab.csu.edu.cn/bioinformatics/RAUR.html.

Show MeSH