Limits...
Error correction of high-throughput sequencing datasets with non-uniform coverage.

Medvedev P, Scott E, Kakaradov B, Pevzner P - Bioinformatics (2011)

Bottom Line: As a result, error correction of sequencing reads remains an important problem.In this article, we develop the method Hammer for error correction without any uniformity assumptions.Hammer is based on a combination of a Hamming graph and a simple probabilistic model for sequencing errors.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Engineering, University of California, San Diego, CA, USA. pmedvedev@cs.ucsd.edu

ABSTRACT

Motivation: The continuing improvements to high-throughput sequencing (HTS) platforms have begun to unfold a myriad of new applications. As a result, error correction of sequencing reads remains an important problem. Though several tools do an excellent job of correcting datasets where the reads are sampled close to uniformly, the problem of correcting reads coming from drastically non-uniform datasets, such as those from single-cell sequencing, remains open.

Results: In this article, we develop the method Hammer for error correction without any uniformity assumptions. Hammer is based on a combination of a Hamming graph and a simple probabilistic model for sequencing errors. It is a simple and adaptable algorithm that improves on other tools on non-uniform single-cell data, while achieving comparable results on normal multi-cell data.

Availability: http://www.cs.toronto.edu/~pashadag.

Contact: pmedvedev@cs.ucsd.edu.

Show MeSH
The histogram showing the number of singletons with the given weighted multiplicity, on a vertical log scale.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3117386&req=5

Figure 4: The histogram showing the number of singletons with the given weighted multiplicity, on a vertical log scale.

Mentions: Out of all the correct distinct k-mers present in the data, 10% were in singletons. The distribution of their weighted multiplicities is shown in Figure 4. Most of the incorrect k-mers have weights <1, however, there is also ~53K correct k-mers that weigh between 0.5 and 1. This difference is apparent when we ran Hammer with a singletonCutoff of 0.5 (Table 1), allowing us to increase 1.2% in sensitivity.Fig. 4.


Error correction of high-throughput sequencing datasets with non-uniform coverage.

Medvedev P, Scott E, Kakaradov B, Pevzner P - Bioinformatics (2011)

The histogram showing the number of singletons with the given weighted multiplicity, on a vertical log scale.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3117386&req=5

Figure 4: The histogram showing the number of singletons with the given weighted multiplicity, on a vertical log scale.
Mentions: Out of all the correct distinct k-mers present in the data, 10% were in singletons. The distribution of their weighted multiplicities is shown in Figure 4. Most of the incorrect k-mers have weights <1, however, there is also ~53K correct k-mers that weigh between 0.5 and 1. This difference is apparent when we ran Hammer with a singletonCutoff of 0.5 (Table 1), allowing us to increase 1.2% in sensitivity.Fig. 4.

Bottom Line: As a result, error correction of sequencing reads remains an important problem.In this article, we develop the method Hammer for error correction without any uniformity assumptions.Hammer is based on a combination of a Hamming graph and a simple probabilistic model for sequencing errors.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Engineering, University of California, San Diego, CA, USA. pmedvedev@cs.ucsd.edu

ABSTRACT

Motivation: The continuing improvements to high-throughput sequencing (HTS) platforms have begun to unfold a myriad of new applications. As a result, error correction of sequencing reads remains an important problem. Though several tools do an excellent job of correcting datasets where the reads are sampled close to uniformly, the problem of correcting reads coming from drastically non-uniform datasets, such as those from single-cell sequencing, remains open.

Results: In this article, we develop the method Hammer for error correction without any uniformity assumptions. Hammer is based on a combination of a Hamming graph and a simple probabilistic model for sequencing errors. It is a simple and adaptable algorithm that improves on other tools on non-uniform single-cell data, while achieving comparable results on normal multi-cell data.

Availability: http://www.cs.toronto.edu/~pashadag.

Contact: pmedvedev@cs.ucsd.edu.

Show MeSH