Limits...
Assembly complexity of prokaryotic genomes using short reads.

Kingsford C, Schatz MC, Pop M - BMC Bioinformatics (2010)

Bottom Line: Further, we demonstrate that the majority of genes in prokaryotic genomes can be reconstructed uniquely using very short reads even if the genomes themselves cannot.The non-reconstructible genes are overwhelmingly related to mobile elements (transposons, IS elements, and prophages).Our results improve upon previous studies on the feasibility of assembly with short reads and provide a comprehensive benchmark against which to compare the performance of the short-read assemblers currently being developed.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Institute for Advanced Computer Studies, University of Maryland, College Park, MD, USA. carlk@cs.umd.edu

ABSTRACT

Background: De Bruijn graphs are a theoretical framework underlying several modern genome assembly programs, especially those that deal with very short reads. We describe an application of de Bruijn graphs to analyze the global repeat structure of prokaryotic genomes.

Results: We provide the first survey of the repeat structure of a large number of genomes. The analysis gives an upper-bound on the performance of genome assemblers for de novo reconstruction of genomes across a wide range of read lengths. Further, we demonstrate that the majority of genes in prokaryotic genomes can be reconstructed uniquely using very short reads even if the genomes themselves cannot. The non-reconstructible genes are overwhelmingly related to mobile elements (transposons, IS elements, and prophages).

Conclusions: Our results improve upon previous studies on the feasibility of assembly with short reads and provide a comprehensive benchmark against which to compare the performance of the short-read assemblers currently being developed.

Show MeSH
Number of words consistent with genome graphs. The size of the solution space for each chromosome using reads of length 50 nt. Only the 365 chromosomes that had fewer than 2900 possible reconstructions are shown.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2821320&req=5

Figure 2: Number of words consistent with genome graphs. The size of the solution space for each chromosome using reads of length 50 nt. Only the 365 chromosomes that had fewer than 2900 possible reconstructions are shown.

Mentions: In Figure 2, we plot (G) for the de Bruijn graphs G constructed from length-50 reads. The number of possible reconstructions is often astronomical. Only the 365 chromosomes with fewer than 2900 possible reconstructions are shown. As one would expect, the longer genomes generally have more solution words. However, this is not always the case. For example, Yersinia pestis appears much more complicated to reconstruct than Escherichia coli, even though both genomes are about the same size (4.4 Mb). Hence, one cannot use genome size as the sole indicator of assembly complexity. In fact, the genome of Y. pestis contains a large number of insertion sequences [34], and these insertion sequences complicate its de Bruijn graph.


Assembly complexity of prokaryotic genomes using short reads.

Kingsford C, Schatz MC, Pop M - BMC Bioinformatics (2010)

Number of words consistent with genome graphs. The size of the solution space for each chromosome using reads of length 50 nt. Only the 365 chromosomes that had fewer than 2900 possible reconstructions are shown.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2821320&req=5

Figure 2: Number of words consistent with genome graphs. The size of the solution space for each chromosome using reads of length 50 nt. Only the 365 chromosomes that had fewer than 2900 possible reconstructions are shown.
Mentions: In Figure 2, we plot (G) for the de Bruijn graphs G constructed from length-50 reads. The number of possible reconstructions is often astronomical. Only the 365 chromosomes with fewer than 2900 possible reconstructions are shown. As one would expect, the longer genomes generally have more solution words. However, this is not always the case. For example, Yersinia pestis appears much more complicated to reconstruct than Escherichia coli, even though both genomes are about the same size (4.4 Mb). Hence, one cannot use genome size as the sole indicator of assembly complexity. In fact, the genome of Y. pestis contains a large number of insertion sequences [34], and these insertion sequences complicate its de Bruijn graph.

Bottom Line: Further, we demonstrate that the majority of genes in prokaryotic genomes can be reconstructed uniquely using very short reads even if the genomes themselves cannot.The non-reconstructible genes are overwhelmingly related to mobile elements (transposons, IS elements, and prophages).Our results improve upon previous studies on the feasibility of assembly with short reads and provide a comprehensive benchmark against which to compare the performance of the short-read assemblers currently being developed.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Institute for Advanced Computer Studies, University of Maryland, College Park, MD, USA. carlk@cs.umd.edu

ABSTRACT

Background: De Bruijn graphs are a theoretical framework underlying several modern genome assembly programs, especially those that deal with very short reads. We describe an application of de Bruijn graphs to analyze the global repeat structure of prokaryotic genomes.

Results: We provide the first survey of the repeat structure of a large number of genomes. The analysis gives an upper-bound on the performance of genome assemblers for de novo reconstruction of genomes across a wide range of read lengths. Further, we demonstrate that the majority of genes in prokaryotic genomes can be reconstructed uniquely using very short reads even if the genomes themselves cannot. The non-reconstructible genes are overwhelmingly related to mobile elements (transposons, IS elements, and prophages).

Conclusions: Our results improve upon previous studies on the feasibility of assembly with short reads and provide a comprehensive benchmark against which to compare the performance of the short-read assemblers currently being developed.

Show MeSH