Limits...
A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome.

Chapman JA, Mascher M, Buluç A, Barry K, Georganas E, Session A, Strnadova V, Jenkins J, Sehgal S, Oliker L, Schmutz J, Yelick KA, Scholz U, Waugh R, Poland JA, Muehlbauer GJ, Stein N, Rokhsar DS - Genome Biol. (2015)

Bottom Line: Polyploid species have long been thought to be recalcitrant to whole-genome assembly.The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly.Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible to construct a mapping population.

View Article: PubMed Central - PubMed

Affiliation: Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA. jarrodc@gmail.com.

ABSTRACT
Polyploid species have long been thought to be recalcitrant to whole-genome assembly. By combining high-throughput sequencing, recent developments in parallel computing, and genetic mapping, we derive, de novo, a sequence assembly representing 9.1 Gbp of the highly repetitive 16 Gbp genome of hexaploid wheat, Triticum aestivum, and assign 7.1 Gb of this assembly to chromosomal locations. The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly. Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible to construct a mapping population.

Show MeSH
51-mer depth distribution for homozygous parental lines. (A) 51-mer frequency distribution for W7984 (red), compared with Opata (black). W7984 was sequenced more deeply to enable de novo WGS assembly. Uptick at low depth (below 51-mer frequency of approximately 5) corresponds to sequencing error. Peak frequency (approximately 18 for W7984, approximately 11 for Opata) represents the typical number of 51-mers covering nucleotides in the non-repetitive regions of the genome. (B) Cumulative frequency distribution for W7984 and Opata as a function of estimated genomic copy count (51-mer frequency divided by peak 51-mer frequency from panel (A)). Note logarithmic scale on the horizontal axis. The two curves lie on top of each other, as expected for two accessions from the same species. Approximately 45% of the hexaploid wheat genome is found in regions that are single copy as measured by 51-mers (estimated genomic copy count ≤2), and the remainder is typically at high 51-mer copy number (approximately 40% of the genome is found in 10 or more copies). The distribution rises smoothly through estimated genome copy counts of two and three, indicating the three subgenomes of hexaploid wheat are largely differentiated at the scale of a 51-mer.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4373400&req=5

Fig1: 51-mer depth distribution for homozygous parental lines. (A) 51-mer frequency distribution for W7984 (red), compared with Opata (black). W7984 was sequenced more deeply to enable de novo WGS assembly. Uptick at low depth (below 51-mer frequency of approximately 5) corresponds to sequencing error. Peak frequency (approximately 18 for W7984, approximately 11 for Opata) represents the typical number of 51-mers covering nucleotides in the non-repetitive regions of the genome. (B) Cumulative frequency distribution for W7984 and Opata as a function of estimated genomic copy count (51-mer frequency divided by peak 51-mer frequency from panel (A)). Note logarithmic scale on the horizontal axis. The two curves lie on top of each other, as expected for two accessions from the same species. Approximately 45% of the hexaploid wheat genome is found in regions that are single copy as measured by 51-mers (estimated genomic copy count ≤2), and the remainder is typically at high 51-mer copy number (approximately 40% of the genome is found in 10 or more copies). The distribution rises smoothly through estimated genome copy counts of two and three, indicating the three subgenomes of hexaploid wheat are largely differentiated at the scale of a 51-mer.

Mentions: The total estimated genome size of W7984 is 16 Gbp, consistent with prior measurements/estimates for T. aestivum [30]. We produced approximately 30× total sequence coverage in fragment libraries, which corresponds to approximately 18× coverage in 51-mers (Figure 1A). The very low-depth uptick (51-mer frequency below approximately 5 counts) represents sequencing errors that are easily distinguished from the error-free portion of the distribution without error correction [29].Figure 1


A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome.

Chapman JA, Mascher M, Buluç A, Barry K, Georganas E, Session A, Strnadova V, Jenkins J, Sehgal S, Oliker L, Schmutz J, Yelick KA, Scholz U, Waugh R, Poland JA, Muehlbauer GJ, Stein N, Rokhsar DS - Genome Biol. (2015)

51-mer depth distribution for homozygous parental lines. (A) 51-mer frequency distribution for W7984 (red), compared with Opata (black). W7984 was sequenced more deeply to enable de novo WGS assembly. Uptick at low depth (below 51-mer frequency of approximately 5) corresponds to sequencing error. Peak frequency (approximately 18 for W7984, approximately 11 for Opata) represents the typical number of 51-mers covering nucleotides in the non-repetitive regions of the genome. (B) Cumulative frequency distribution for W7984 and Opata as a function of estimated genomic copy count (51-mer frequency divided by peak 51-mer frequency from panel (A)). Note logarithmic scale on the horizontal axis. The two curves lie on top of each other, as expected for two accessions from the same species. Approximately 45% of the hexaploid wheat genome is found in regions that are single copy as measured by 51-mers (estimated genomic copy count ≤2), and the remainder is typically at high 51-mer copy number (approximately 40% of the genome is found in 10 or more copies). The distribution rises smoothly through estimated genome copy counts of two and three, indicating the three subgenomes of hexaploid wheat are largely differentiated at the scale of a 51-mer.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4373400&req=5

Fig1: 51-mer depth distribution for homozygous parental lines. (A) 51-mer frequency distribution for W7984 (red), compared with Opata (black). W7984 was sequenced more deeply to enable de novo WGS assembly. Uptick at low depth (below 51-mer frequency of approximately 5) corresponds to sequencing error. Peak frequency (approximately 18 for W7984, approximately 11 for Opata) represents the typical number of 51-mers covering nucleotides in the non-repetitive regions of the genome. (B) Cumulative frequency distribution for W7984 and Opata as a function of estimated genomic copy count (51-mer frequency divided by peak 51-mer frequency from panel (A)). Note logarithmic scale on the horizontal axis. The two curves lie on top of each other, as expected for two accessions from the same species. Approximately 45% of the hexaploid wheat genome is found in regions that are single copy as measured by 51-mers (estimated genomic copy count ≤2), and the remainder is typically at high 51-mer copy number (approximately 40% of the genome is found in 10 or more copies). The distribution rises smoothly through estimated genome copy counts of two and three, indicating the three subgenomes of hexaploid wheat are largely differentiated at the scale of a 51-mer.
Mentions: The total estimated genome size of W7984 is 16 Gbp, consistent with prior measurements/estimates for T. aestivum [30]. We produced approximately 30× total sequence coverage in fragment libraries, which corresponds to approximately 18× coverage in 51-mers (Figure 1A). The very low-depth uptick (51-mer frequency below approximately 5 counts) represents sequencing errors that are easily distinguished from the error-free portion of the distribution without error correction [29].Figure 1

Bottom Line: Polyploid species have long been thought to be recalcitrant to whole-genome assembly.The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly.Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible to construct a mapping population.

View Article: PubMed Central - PubMed

Affiliation: Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA. jarrodc@gmail.com.

ABSTRACT
Polyploid species have long been thought to be recalcitrant to whole-genome assembly. By combining high-throughput sequencing, recent developments in parallel computing, and genetic mapping, we derive, de novo, a sequence assembly representing 9.1 Gbp of the highly repetitive 16 Gbp genome of hexaploid wheat, Triticum aestivum, and assign 7.1 Gb of this assembly to chromosomal locations. The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly. Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible to construct a mapping population.

Show MeSH