Limits...
A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome.

Chapman JA, Mascher M, Buluç A, Barry K, Georganas E, Session A, Strnadova V, Jenkins J, Sehgal S, Oliker L, Schmutz J, Yelick KA, Scholz U, Waugh R, Poland JA, Muehlbauer GJ, Stein N, Rokhsar DS - Genome Biol. (2015)

Bottom Line: Polyploid species have long been thought to be recalcitrant to whole-genome assembly.The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly.Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible to construct a mapping population.

View Article: PubMed Central - PubMed

Affiliation: Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA. jarrodc@gmail.com.

ABSTRACT
Polyploid species have long been thought to be recalcitrant to whole-genome assembly. By combining high-throughput sequencing, recent developments in parallel computing, and genetic mapping, we derive, de novo, a sequence assembly representing 9.1 Gbp of the highly repetitive 16 Gbp genome of hexaploid wheat, Triticum aestivum, and assign 7.1 Gb of this assembly to chromosomal locations. The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly. Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible to construct a mapping population.

Show MeSH

Related in: MedlinePlus

Distribution of percent identities of alignments of ‘Chinese Spring’ full-length cDNAs versus genome assemblies. (A) Frequency distribution of best percent identity of flcDNA alignments to IWGSC ‘Chinese Spring’ (blue bars) and W7984 WGS (red bars) assemblies. Results for both assemblies are superimposed; red and blue overlap is shown as purple. Included are all alignments longer than 50% of query flcDNA length. Note that while most ‘Chinese Spring’ cDNAs align at >99.75% identity to the IWGSC ‘Chinese Spring’ genome assembly, there is a long tail of lower identity best matches that could arise from errors in the genome assembly or in the flcDNA sequences. Matches to the W7984 assembly show most matches >99.50%, as expected given the intra-specific polymorphism between ‘Chinese Spring’ and W7984, but also show the long tail of lower identity. For W7984, these may arise from the absence in the genotype of the locus corresponding to the ‘Chinese Spring’ cDNA. (B) Frequency distribution of percent identity of flcDNA alignments longer than 50% of query flcDNA length, showing only those cDNAs with five or fewer such alignments. The secondary peak centered at approximately 97 to 97.5% corresponds to homeologous matches. As expected given the polymorphism between the two hexaploid wheat lines, the ‘Chinese Spring’ cDNAs align at slightly higher identity to their own genotype than to W7984.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4373400&req=5

Fig3: Distribution of percent identities of alignments of ‘Chinese Spring’ full-length cDNAs versus genome assemblies. (A) Frequency distribution of best percent identity of flcDNA alignments to IWGSC ‘Chinese Spring’ (blue bars) and W7984 WGS (red bars) assemblies. Results for both assemblies are superimposed; red and blue overlap is shown as purple. Included are all alignments longer than 50% of query flcDNA length. Note that while most ‘Chinese Spring’ cDNAs align at >99.75% identity to the IWGSC ‘Chinese Spring’ genome assembly, there is a long tail of lower identity best matches that could arise from errors in the genome assembly or in the flcDNA sequences. Matches to the W7984 assembly show most matches >99.50%, as expected given the intra-specific polymorphism between ‘Chinese Spring’ and W7984, but also show the long tail of lower identity. For W7984, these may arise from the absence in the genotype of the locus corresponding to the ‘Chinese Spring’ cDNA. (B) Frequency distribution of percent identity of flcDNA alignments longer than 50% of query flcDNA length, showing only those cDNAs with five or fewer such alignments. The secondary peak centered at approximately 97 to 97.5% corresponds to homeologous matches. As expected given the polymorphism between the two hexaploid wheat lines, the ‘Chinese Spring’ cDNAs align at slightly higher identity to their own genotype than to W7984.

Mentions: To assess the global gene-space completeness of our whole genome assembly and the chromosome-sorted shotgun assemblies of the International Wheat Genome Sequencing Consortium (IWGSC), we compared them to a set of 6,000 (non-repetitive) full-length cDNA sequences from T. aestivum cv. ‘Chinese Spring’ [35] (Figure 3; Table S6 in Additional file 1). The majority of these cDNAs aligned over at least 50% of their length to single scaffolds in the two assemblies with the expected near-perfect identity (77.7% meraculous, 76.3% IWGSC; minimum 99% nucleotide identity). An additional approximately 20% are consistent with alignment to over 50% of the length of a homeologous locus with approximately 97% nucleotide identity (Figure 3B).Figure 3


A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome.

Chapman JA, Mascher M, Buluç A, Barry K, Georganas E, Session A, Strnadova V, Jenkins J, Sehgal S, Oliker L, Schmutz J, Yelick KA, Scholz U, Waugh R, Poland JA, Muehlbauer GJ, Stein N, Rokhsar DS - Genome Biol. (2015)

Distribution of percent identities of alignments of ‘Chinese Spring’ full-length cDNAs versus genome assemblies. (A) Frequency distribution of best percent identity of flcDNA alignments to IWGSC ‘Chinese Spring’ (blue bars) and W7984 WGS (red bars) assemblies. Results for both assemblies are superimposed; red and blue overlap is shown as purple. Included are all alignments longer than 50% of query flcDNA length. Note that while most ‘Chinese Spring’ cDNAs align at >99.75% identity to the IWGSC ‘Chinese Spring’ genome assembly, there is a long tail of lower identity best matches that could arise from errors in the genome assembly or in the flcDNA sequences. Matches to the W7984 assembly show most matches >99.50%, as expected given the intra-specific polymorphism between ‘Chinese Spring’ and W7984, but also show the long tail of lower identity. For W7984, these may arise from the absence in the genotype of the locus corresponding to the ‘Chinese Spring’ cDNA. (B) Frequency distribution of percent identity of flcDNA alignments longer than 50% of query flcDNA length, showing only those cDNAs with five or fewer such alignments. The secondary peak centered at approximately 97 to 97.5% corresponds to homeologous matches. As expected given the polymorphism between the two hexaploid wheat lines, the ‘Chinese Spring’ cDNAs align at slightly higher identity to their own genotype than to W7984.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4373400&req=5

Fig3: Distribution of percent identities of alignments of ‘Chinese Spring’ full-length cDNAs versus genome assemblies. (A) Frequency distribution of best percent identity of flcDNA alignments to IWGSC ‘Chinese Spring’ (blue bars) and W7984 WGS (red bars) assemblies. Results for both assemblies are superimposed; red and blue overlap is shown as purple. Included are all alignments longer than 50% of query flcDNA length. Note that while most ‘Chinese Spring’ cDNAs align at >99.75% identity to the IWGSC ‘Chinese Spring’ genome assembly, there is a long tail of lower identity best matches that could arise from errors in the genome assembly or in the flcDNA sequences. Matches to the W7984 assembly show most matches >99.50%, as expected given the intra-specific polymorphism between ‘Chinese Spring’ and W7984, but also show the long tail of lower identity. For W7984, these may arise from the absence in the genotype of the locus corresponding to the ‘Chinese Spring’ cDNA. (B) Frequency distribution of percent identity of flcDNA alignments longer than 50% of query flcDNA length, showing only those cDNAs with five or fewer such alignments. The secondary peak centered at approximately 97 to 97.5% corresponds to homeologous matches. As expected given the polymorphism between the two hexaploid wheat lines, the ‘Chinese Spring’ cDNAs align at slightly higher identity to their own genotype than to W7984.
Mentions: To assess the global gene-space completeness of our whole genome assembly and the chromosome-sorted shotgun assemblies of the International Wheat Genome Sequencing Consortium (IWGSC), we compared them to a set of 6,000 (non-repetitive) full-length cDNA sequences from T. aestivum cv. ‘Chinese Spring’ [35] (Figure 3; Table S6 in Additional file 1). The majority of these cDNAs aligned over at least 50% of their length to single scaffolds in the two assemblies with the expected near-perfect identity (77.7% meraculous, 76.3% IWGSC; minimum 99% nucleotide identity). An additional approximately 20% are consistent with alignment to over 50% of the length of a homeologous locus with approximately 97% nucleotide identity (Figure 3B).Figure 3

Bottom Line: Polyploid species have long been thought to be recalcitrant to whole-genome assembly.The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly.Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible to construct a mapping population.

View Article: PubMed Central - PubMed

Affiliation: Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA. jarrodc@gmail.com.

ABSTRACT
Polyploid species have long been thought to be recalcitrant to whole-genome assembly. By combining high-throughput sequencing, recent developments in parallel computing, and genetic mapping, we derive, de novo, a sequence assembly representing 9.1 Gbp of the highly repetitive 16 Gbp genome of hexaploid wheat, Triticum aestivum, and assign 7.1 Gb of this assembly to chromosomal locations. The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly. Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible to construct a mapping population.

Show MeSH
Related in: MedlinePlus