Limits...
Whole genome sequencing of elite rice cultivars as a comprehensive information resource for marker assisted selection.

Duitama J, Silva A, Sanabria Y, Cruz DF, Quintero C, Ballen C, Lorieux M, Scheffler B, Farmer A, Torres E, Oard J, Tohme J - PLoS ONE (2015)

Bottom Line: We identified repetitive elements and recurrent copy number variation covering about 200 Mbp of the rice genome.Genotyping of over 18 million polymorphic locations within O. sativa allowed us to reconstruct the individual haplotype patterns shaping the genomic background of elite varieties used by farmers throughout the Americas.We expect that both the analysis methods and the genomic information described here would be of great use for the rice research community and for other groups carrying on similar sequencing efforts in other crops.

View Article: PubMed Central - PubMed

Affiliation: Agrobiodiversity research area, International Center for Tropical Agriculture, Cali, Colombia.

ABSTRACT
Current advances in sequencing technologies and bioinformatics revealed the genomic background of rice, a staple food for the poor people, and provided the basis to develop large genomic variation databases for thousands of cultivars. Proper analysis of this massive resource is expected to give novel insights into the structure, function, and evolution of the rice genome, and to aid the development of rice varieties through marker assisted selection or genomic selection. In this work we present sequencing and bioinformatics analyses of 104 rice varieties belonging to the major subspecies of Oryza sativa. We identified repetitive elements and recurrent copy number variation covering about 200 Mbp of the rice genome. Genotyping of over 18 million polymorphic locations within O. sativa allowed us to reconstruct the individual haplotype patterns shaping the genomic background of elite varieties used by farmers throughout the Americas. Based on a reconstruction of the alleles for the gene GBSSI, we could identify novel genetic markers for selection of varieties with high amylose content. We expect that both the analysis methods and the genomic information described here would be of great use for the rice research community and for other groups carrying on similar sequencing efforts in other crops.

No MeSH data available.


Comparison of CNV calls in rice cultivars.Number of 100bp bins with a) duplications, and b) deletions discriminated by the percentage of each population in which the event is reported (red: Indica, blue: japonica overall, yellow: tropical japonica, and light blue: temperate japonica). The lines indicate the percentage of bins for each category falling within repetitive regions in Nipponbare. c) Number of bins not spanning Nipponbare repeats with predicted CNVs common for each subpopulation (indica, japonica, tropical japonica, and temperate japonica) discriminated by the predicted copy number, being two the normal copy number for a diploid region. d) Example of a discriminative duplication between indica and japonica. Reads taken from the two copies of this region present in indica samples align to the same genomic location producing clusters of heterozygous SNPs. Colors in the left panel differentiate the following groups: O. rufipogon (RUF), aromatic (ARO), temperate japonica (TEJ), tropical japonica (TRJ), indica (IND), aus (AUS), O. nivara (NIV), and admixed (ADM). Homozygous genotype calls carrying the reference allele are colored blue. Homozygous genotype calls carrying an allele different from the reference are colored red. Heterozygous genotype calls are colored half blue and half red.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4414565&req=5

pone.0124617.g002: Comparison of CNV calls in rice cultivars.Number of 100bp bins with a) duplications, and b) deletions discriminated by the percentage of each population in which the event is reported (red: Indica, blue: japonica overall, yellow: tropical japonica, and light blue: temperate japonica). The lines indicate the percentage of bins for each category falling within repetitive regions in Nipponbare. c) Number of bins not spanning Nipponbare repeats with predicted CNVs common for each subpopulation (indica, japonica, tropical japonica, and temperate japonica) discriminated by the predicted copy number, being two the normal copy number for a diploid region. d) Example of a discriminative duplication between indica and japonica. Reads taken from the two copies of this region present in indica samples align to the same genomic location producing clusters of heterozygous SNPs. Colors in the left panel differentiate the following groups: O. rufipogon (RUF), aromatic (ARO), temperate japonica (TEJ), tropical japonica (TRJ), indica (IND), aus (AUS), O. nivara (NIV), and admixed (ADM). Homozygous genotype calls carrying the reference allele are colored blue. Homozygous genotype calls carrying an allele different from the reference are colored red. Heterozygous genotype calls are colored half blue and half red.

Mentions: We performed on each sample the read-depth analysis provided by NGSEP to identify regions with copy number variation (CNVs). For this analysis we discarded 29 accessions for which the read-depth distribution suggested that coverage was not evenly distributed along the genome (S2 Table). We compared the CNVs identified for 21 indica, 12 temperate japonica, and 18 tropical japonica varieties, which were chosen following the clusters observed in the distance trees. To facilitate comparisons among samples and events with variable lengths, we retrieved and compared the copy number estimation for each sample on non-overlapping bins of 100bp across the genome. For each group we identified between 2.3 and 2.8 million bins with duplication events and between 475 and 725 thousand bins with deletion events. This represents over 10 times more variation than that observed using high-density array comparative genomic hybridization [40] or using the read-depth analysis carried on by [7] for 50 accessions. Figs 2a and 2b shows the distribution of bins with duplication and deletion events as a function of the percentage of samples in which the variation was discovered. Between 55% and 65% of the bins with duplications and between 70% and 95% of the bins with deletions were reported by less than half of the samples within each subpopulation. We also found that most of the bins with duplications (over 97% for common duplications) overlap with repeats. In contrast, only 65% of the bins with predicted deletion events overlap with repeats. After removing bins within repeats and bins with events reported in less than half of the samples within each population, the number of bins with CNVs was reduced to 105,606 for indica, 58,896 for tropical japonica and 30,158 for temperate japonica. This is expected because most of the common duplications within the temperate japonica accessions in our study should already be identified as repeats in the Nipponbare reference sequence which is also temperate japonica. Likewise, common deletions within temperate japonica should mostly correspond with DNA present in Nipponbare and absent in other temperate japonica cultivars. Consequently, filtering out repetitive regions, recurrent duplications are more common than recurrent deletions within temperate japonica, whereas recurrent deletions are more common than recurrent duplications within tropical japonica and within indica. Fig 2c shows the distribution of bins with common CNVs in non-repetitive regions for different average numbers of copies. For every population, homozygous deletions were twice more common than heterozygous (copy number 1) deletions. Moreover, homozygous duplications (copy number 4) were twice more common than heterozygous duplications (copy number 3). Although we do not have a gold-standard set of CNVs to perform a systematic comparison with other methods, we performed an initial comparison of the CNVs identified using NGSEP with the CNVs identified using mrCaNaVaR [31]. On average MrCaNaVaR called deletions on about 70 Mbp for each variety, which is close to 4 times more genomic sites for indica and close to 6 times more sites for japonica compared to NGSEP (S8 Fig). MrCaNaVaR also called 1.5 more regions as duplications for both indica and japonica varieties compared to NGSEP. For most of the samples over 80% of the deletions and over 70% of the duplications identified by NGSEP were also identified by mrCaNaVaR, which provides additional confidence on the events called by NGSEP.


Whole genome sequencing of elite rice cultivars as a comprehensive information resource for marker assisted selection.

Duitama J, Silva A, Sanabria Y, Cruz DF, Quintero C, Ballen C, Lorieux M, Scheffler B, Farmer A, Torres E, Oard J, Tohme J - PLoS ONE (2015)

Comparison of CNV calls in rice cultivars.Number of 100bp bins with a) duplications, and b) deletions discriminated by the percentage of each population in which the event is reported (red: Indica, blue: japonica overall, yellow: tropical japonica, and light blue: temperate japonica). The lines indicate the percentage of bins for each category falling within repetitive regions in Nipponbare. c) Number of bins not spanning Nipponbare repeats with predicted CNVs common for each subpopulation (indica, japonica, tropical japonica, and temperate japonica) discriminated by the predicted copy number, being two the normal copy number for a diploid region. d) Example of a discriminative duplication between indica and japonica. Reads taken from the two copies of this region present in indica samples align to the same genomic location producing clusters of heterozygous SNPs. Colors in the left panel differentiate the following groups: O. rufipogon (RUF), aromatic (ARO), temperate japonica (TEJ), tropical japonica (TRJ), indica (IND), aus (AUS), O. nivara (NIV), and admixed (ADM). Homozygous genotype calls carrying the reference allele are colored blue. Homozygous genotype calls carrying an allele different from the reference are colored red. Heterozygous genotype calls are colored half blue and half red.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4414565&req=5

pone.0124617.g002: Comparison of CNV calls in rice cultivars.Number of 100bp bins with a) duplications, and b) deletions discriminated by the percentage of each population in which the event is reported (red: Indica, blue: japonica overall, yellow: tropical japonica, and light blue: temperate japonica). The lines indicate the percentage of bins for each category falling within repetitive regions in Nipponbare. c) Number of bins not spanning Nipponbare repeats with predicted CNVs common for each subpopulation (indica, japonica, tropical japonica, and temperate japonica) discriminated by the predicted copy number, being two the normal copy number for a diploid region. d) Example of a discriminative duplication between indica and japonica. Reads taken from the two copies of this region present in indica samples align to the same genomic location producing clusters of heterozygous SNPs. Colors in the left panel differentiate the following groups: O. rufipogon (RUF), aromatic (ARO), temperate japonica (TEJ), tropical japonica (TRJ), indica (IND), aus (AUS), O. nivara (NIV), and admixed (ADM). Homozygous genotype calls carrying the reference allele are colored blue. Homozygous genotype calls carrying an allele different from the reference are colored red. Heterozygous genotype calls are colored half blue and half red.
Mentions: We performed on each sample the read-depth analysis provided by NGSEP to identify regions with copy number variation (CNVs). For this analysis we discarded 29 accessions for which the read-depth distribution suggested that coverage was not evenly distributed along the genome (S2 Table). We compared the CNVs identified for 21 indica, 12 temperate japonica, and 18 tropical japonica varieties, which were chosen following the clusters observed in the distance trees. To facilitate comparisons among samples and events with variable lengths, we retrieved and compared the copy number estimation for each sample on non-overlapping bins of 100bp across the genome. For each group we identified between 2.3 and 2.8 million bins with duplication events and between 475 and 725 thousand bins with deletion events. This represents over 10 times more variation than that observed using high-density array comparative genomic hybridization [40] or using the read-depth analysis carried on by [7] for 50 accessions. Figs 2a and 2b shows the distribution of bins with duplication and deletion events as a function of the percentage of samples in which the variation was discovered. Between 55% and 65% of the bins with duplications and between 70% and 95% of the bins with deletions were reported by less than half of the samples within each subpopulation. We also found that most of the bins with duplications (over 97% for common duplications) overlap with repeats. In contrast, only 65% of the bins with predicted deletion events overlap with repeats. After removing bins within repeats and bins with events reported in less than half of the samples within each population, the number of bins with CNVs was reduced to 105,606 for indica, 58,896 for tropical japonica and 30,158 for temperate japonica. This is expected because most of the common duplications within the temperate japonica accessions in our study should already be identified as repeats in the Nipponbare reference sequence which is also temperate japonica. Likewise, common deletions within temperate japonica should mostly correspond with DNA present in Nipponbare and absent in other temperate japonica cultivars. Consequently, filtering out repetitive regions, recurrent duplications are more common than recurrent deletions within temperate japonica, whereas recurrent deletions are more common than recurrent duplications within tropical japonica and within indica. Fig 2c shows the distribution of bins with common CNVs in non-repetitive regions for different average numbers of copies. For every population, homozygous deletions were twice more common than heterozygous (copy number 1) deletions. Moreover, homozygous duplications (copy number 4) were twice more common than heterozygous duplications (copy number 3). Although we do not have a gold-standard set of CNVs to perform a systematic comparison with other methods, we performed an initial comparison of the CNVs identified using NGSEP with the CNVs identified using mrCaNaVaR [31]. On average MrCaNaVaR called deletions on about 70 Mbp for each variety, which is close to 4 times more genomic sites for indica and close to 6 times more sites for japonica compared to NGSEP (S8 Fig). MrCaNaVaR also called 1.5 more regions as duplications for both indica and japonica varieties compared to NGSEP. For most of the samples over 80% of the deletions and over 70% of the duplications identified by NGSEP were also identified by mrCaNaVaR, which provides additional confidence on the events called by NGSEP.

Bottom Line: We identified repetitive elements and recurrent copy number variation covering about 200 Mbp of the rice genome.Genotyping of over 18 million polymorphic locations within O. sativa allowed us to reconstruct the individual haplotype patterns shaping the genomic background of elite varieties used by farmers throughout the Americas.We expect that both the analysis methods and the genomic information described here would be of great use for the rice research community and for other groups carrying on similar sequencing efforts in other crops.

View Article: PubMed Central - PubMed

Affiliation: Agrobiodiversity research area, International Center for Tropical Agriculture, Cali, Colombia.

ABSTRACT
Current advances in sequencing technologies and bioinformatics revealed the genomic background of rice, a staple food for the poor people, and provided the basis to develop large genomic variation databases for thousands of cultivars. Proper analysis of this massive resource is expected to give novel insights into the structure, function, and evolution of the rice genome, and to aid the development of rice varieties through marker assisted selection or genomic selection. In this work we present sequencing and bioinformatics analyses of 104 rice varieties belonging to the major subspecies of Oryza sativa. We identified repetitive elements and recurrent copy number variation covering about 200 Mbp of the rice genome. Genotyping of over 18 million polymorphic locations within O. sativa allowed us to reconstruct the individual haplotype patterns shaping the genomic background of elite varieties used by farmers throughout the Americas. Based on a reconstruction of the alleles for the gene GBSSI, we could identify novel genetic markers for selection of varieties with high amylose content. We expect that both the analysis methods and the genomic information described here would be of great use for the rice research community and for other groups carrying on similar sequencing efforts in other crops.

No MeSH data available.