Limits...
A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome.

Chapman JA, Mascher M, Buluç A, Barry K, Georganas E, Session A, Strnadova V, Jenkins J, Sehgal S, Oliker L, Schmutz J, Yelick KA, Scholz U, Waugh R, Poland JA, Muehlbauer GJ, Stein N, Rokhsar DS - Genome Biol. (2015)

Bottom Line: Polyploid species have long been thought to be recalcitrant to whole-genome assembly.The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly.Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible to construct a mapping population.

View Article: PubMed Central - PubMed

Affiliation: Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA. jarrodc@gmail.com.

ABSTRACT
Polyploid species have long been thought to be recalcitrant to whole-genome assembly. By combining high-throughput sequencing, recent developments in parallel computing, and genetic mapping, we derive, de novo, a sequence assembly representing 9.1 Gbp of the highly repetitive 16 Gbp genome of hexaploid wheat, Triticum aestivum, and assign 7.1 Gb of this assembly to chromosomal locations. The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly. Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible to construct a mapping population.

Show MeSH

Related in: MedlinePlus

Cumulative distributions of assembled sequence as a function of scaffold and contig length. The total amount of assembled sequence in scaffolds or contigs longer than a minimum length is shown. As the available paired-end insert size is increased, the W7984 WGS assembly becomes progressively longer, with the inclusion of short-inserts (<500 bp) only (red); the addition of medium-inserts (700 bp to 1 kbp; dark blue); and finally the inclusion of approximately 4 kbp insert mate pairs (green). For comparison, the International Wheat Genome Sequencing Consortium chromosome-sorted assembly of ‘Chinese Spring’ (CSS) is also shown (black dashed line). Cumulative contig distributions for W7984 (light blue) and CSS (gray dashed line) are also depicted. As predicted by assembly theory, these quantities are exponentially distributed with decay lengths proportional to the N50 length scale of the assembly. This demonstrates that the excess length of the CSS assembly is restricted to an abundance of very short sequences (less than 1 kbp in length) that are outside of the body of the main exponential decay curves.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4373400&req=5

Fig2: Cumulative distributions of assembled sequence as a function of scaffold and contig length. The total amount of assembled sequence in scaffolds or contigs longer than a minimum length is shown. As the available paired-end insert size is increased, the W7984 WGS assembly becomes progressively longer, with the inclusion of short-inserts (<500 bp) only (red); the addition of medium-inserts (700 bp to 1 kbp; dark blue); and finally the inclusion of approximately 4 kbp insert mate pairs (green). For comparison, the International Wheat Genome Sequencing Consortium chromosome-sorted assembly of ‘Chinese Spring’ (CSS) is also shown (black dashed line). Cumulative contig distributions for W7984 (light blue) and CSS (gray dashed line) are also depicted. As predicted by assembly theory, these quantities are exponentially distributed with decay lengths proportional to the N50 length scale of the assembly. This demonstrates that the excess length of the CSS assembly is restricted to an abundance of very short sequences (less than 1 kbp in length) that are outside of the body of the main exponential decay curves.

Mentions: In comparison the chromosome-arm assemblies of ‘Chinese Spring’ [5] total 10.1 Gbp with a scaffold N50 length of 2.3 kbp excluding scaffolds shorter than 1 kbp; however, the total ‘Chinese Spring’ scaffold length drops to 7.0 Gbp with an N50 length of 4.2 kbp, so a full 3.1 Gb of this assembly is in very short scaffolds less than 1 kbp. Thus, our whole genome assembly using only short-insert data is comparable in quality to the chromosome-arm assemblies (also performed with only short-insert data, but typically with 30 to 200× shotgun depth compared with our uniform 28× short-insert coverage). When longer-range paired ends from a whole genome library are included, our WGS assemblies produce a substantially longer assembly, more than doubling the typical contig size and extending the scaffolding by a factor of 5 to 6 (Figure 2). As shown below, these extended sequences allow more complete genes to be captured, and enhances our ability to attach assembled scaffolds to the genetic map, and therefore to be positioned at a specific chromosomal location.Figure 2


A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome.

Chapman JA, Mascher M, Buluç A, Barry K, Georganas E, Session A, Strnadova V, Jenkins J, Sehgal S, Oliker L, Schmutz J, Yelick KA, Scholz U, Waugh R, Poland JA, Muehlbauer GJ, Stein N, Rokhsar DS - Genome Biol. (2015)

Cumulative distributions of assembled sequence as a function of scaffold and contig length. The total amount of assembled sequence in scaffolds or contigs longer than a minimum length is shown. As the available paired-end insert size is increased, the W7984 WGS assembly becomes progressively longer, with the inclusion of short-inserts (<500 bp) only (red); the addition of medium-inserts (700 bp to 1 kbp; dark blue); and finally the inclusion of approximately 4 kbp insert mate pairs (green). For comparison, the International Wheat Genome Sequencing Consortium chromosome-sorted assembly of ‘Chinese Spring’ (CSS) is also shown (black dashed line). Cumulative contig distributions for W7984 (light blue) and CSS (gray dashed line) are also depicted. As predicted by assembly theory, these quantities are exponentially distributed with decay lengths proportional to the N50 length scale of the assembly. This demonstrates that the excess length of the CSS assembly is restricted to an abundance of very short sequences (less than 1 kbp in length) that are outside of the body of the main exponential decay curves.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4373400&req=5

Fig2: Cumulative distributions of assembled sequence as a function of scaffold and contig length. The total amount of assembled sequence in scaffolds or contigs longer than a minimum length is shown. As the available paired-end insert size is increased, the W7984 WGS assembly becomes progressively longer, with the inclusion of short-inserts (<500 bp) only (red); the addition of medium-inserts (700 bp to 1 kbp; dark blue); and finally the inclusion of approximately 4 kbp insert mate pairs (green). For comparison, the International Wheat Genome Sequencing Consortium chromosome-sorted assembly of ‘Chinese Spring’ (CSS) is also shown (black dashed line). Cumulative contig distributions for W7984 (light blue) and CSS (gray dashed line) are also depicted. As predicted by assembly theory, these quantities are exponentially distributed with decay lengths proportional to the N50 length scale of the assembly. This demonstrates that the excess length of the CSS assembly is restricted to an abundance of very short sequences (less than 1 kbp in length) that are outside of the body of the main exponential decay curves.
Mentions: In comparison the chromosome-arm assemblies of ‘Chinese Spring’ [5] total 10.1 Gbp with a scaffold N50 length of 2.3 kbp excluding scaffolds shorter than 1 kbp; however, the total ‘Chinese Spring’ scaffold length drops to 7.0 Gbp with an N50 length of 4.2 kbp, so a full 3.1 Gb of this assembly is in very short scaffolds less than 1 kbp. Thus, our whole genome assembly using only short-insert data is comparable in quality to the chromosome-arm assemblies (also performed with only short-insert data, but typically with 30 to 200× shotgun depth compared with our uniform 28× short-insert coverage). When longer-range paired ends from a whole genome library are included, our WGS assemblies produce a substantially longer assembly, more than doubling the typical contig size and extending the scaffolding by a factor of 5 to 6 (Figure 2). As shown below, these extended sequences allow more complete genes to be captured, and enhances our ability to attach assembled scaffolds to the genetic map, and therefore to be positioned at a specific chromosomal location.Figure 2

Bottom Line: Polyploid species have long been thought to be recalcitrant to whole-genome assembly.The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly.Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible to construct a mapping population.

View Article: PubMed Central - PubMed

Affiliation: Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA. jarrodc@gmail.com.

ABSTRACT
Polyploid species have long been thought to be recalcitrant to whole-genome assembly. By combining high-throughput sequencing, recent developments in parallel computing, and genetic mapping, we derive, de novo, a sequence assembly representing 9.1 Gbp of the highly repetitive 16 Gbp genome of hexaploid wheat, Triticum aestivum, and assign 7.1 Gb of this assembly to chromosomal locations. The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly. Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible to construct a mapping population.

Show MeSH
Related in: MedlinePlus