Limits...
Lighter: fast and memory-efficient sequencing error correction without counting.

Song L, Florea L, Langmead B - Genome Biol. (2014)

Bottom Line: Lighter avoids counting k-mers.As long as the sampling fraction is adjusted in inverse proportion to the depth of sequencing, Bloom filter size can be held constant while maintaining near-constant accuracy.Lighter is parallelized, uses no secondary storage, and is both faster and more memory-efficient than competing approaches while achieving comparable accuracy.

View Article: PubMed Central - PubMed

ABSTRACT
Lighter is a fast, memory-efficient tool for correcting sequencing errors. Lighter avoids counting k-mers. Instead, it uses a pair of Bloom filters, one holding a sample of the input k-mers and the other holding k-mers likely to be correct. As long as the sampling fraction is adjusted in inverse proportion to the depth of sequencing, Bloom filter size can be held constant while maintaining near-constant accuracy. Lighter is parallelized, uses no secondary storage, and is both faster and more memory-efficient than competing approaches while achieving comparable accuracy.

Show MeSH
The effect ofα on occupancy of Bloom filters A and B. The effect of α on occupancy of Bloom filters A and B using simulated 35×, 70× and 140× datasets. The error rate is 1%.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4248469&req=5

Fig4: The effect ofα on occupancy of Bloom filters A and B. The effect of α on occupancy of Bloom filters A and B using simulated 35×, 70× and 140× datasets. The error rate is 1%.

Mentions: As shown in Figures 3 and 4, only a fraction of the correct k-mers are added to A when α is very small, causing many correct read positions to fail the threshold test. Lighter attempts to ‘correct’ these error-free positions, decreasing accuracy. This also has the effect of reducing the number of consecutive stretches of k trusted positions in the reads, leading to a smaller fraction of correct k-mers added to B, and ultimately to lower accuracy. When α grows too large, the yx thresholds grow to be greater than k, causing all positions to fail the threshold test, as seen in the right-hand side of Figure 4. This also leads to a dramatic drop in accuracy as seen in Figure 3. Between the two extremes, we find a fairly broad range of values for α (from about 0.15 to 0.3) that yield high accuracy when the error rate is 1% or 3%. The range is wider when the error rate is lower.Figure 3


Lighter: fast and memory-efficient sequencing error correction without counting.

Song L, Florea L, Langmead B - Genome Biol. (2014)

The effect ofα on occupancy of Bloom filters A and B. The effect of α on occupancy of Bloom filters A and B using simulated 35×, 70× and 140× datasets. The error rate is 1%.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4248469&req=5

Fig4: The effect ofα on occupancy of Bloom filters A and B. The effect of α on occupancy of Bloom filters A and B using simulated 35×, 70× and 140× datasets. The error rate is 1%.
Mentions: As shown in Figures 3 and 4, only a fraction of the correct k-mers are added to A when α is very small, causing many correct read positions to fail the threshold test. Lighter attempts to ‘correct’ these error-free positions, decreasing accuracy. This also has the effect of reducing the number of consecutive stretches of k trusted positions in the reads, leading to a smaller fraction of correct k-mers added to B, and ultimately to lower accuracy. When α grows too large, the yx thresholds grow to be greater than k, causing all positions to fail the threshold test, as seen in the right-hand side of Figure 4. This also leads to a dramatic drop in accuracy as seen in Figure 3. Between the two extremes, we find a fairly broad range of values for α (from about 0.15 to 0.3) that yield high accuracy when the error rate is 1% or 3%. The range is wider when the error rate is lower.Figure 3

Bottom Line: Lighter avoids counting k-mers.As long as the sampling fraction is adjusted in inverse proportion to the depth of sequencing, Bloom filter size can be held constant while maintaining near-constant accuracy.Lighter is parallelized, uses no secondary storage, and is both faster and more memory-efficient than competing approaches while achieving comparable accuracy.

View Article: PubMed Central - PubMed

ABSTRACT
Lighter is a fast, memory-efficient tool for correcting sequencing errors. Lighter avoids counting k-mers. Instead, it uses a pair of Bloom filters, one holding a sample of the input k-mers and the other holding k-mers likely to be correct. As long as the sampling fraction is adjusted in inverse proportion to the depth of sequencing, Bloom filter size can be held constant while maintaining near-constant accuracy. Lighter is parallelized, uses no secondary storage, and is both faster and more memory-efficient than competing approaches while achieving comparable accuracy.

Show MeSH