Limits...
Genomic prediction of biological shape: elliptic Fourier analysis and kernel partial least squares (PLS) regression applied to grain shape prediction in rice (Oryza sativa L.).

Iwata H, Ebana K, Uga Y, Hayashi T - PLoS ONE (2015)

Bottom Line: Among the four methods, KPLS showed the highest accuracy.Ordinary PLS, however, was less accurate than RR in all datasets, suggesting that the use of a non-linear kernel was necessary for accurate prediction using the PLS method.The proposed method is expected to be useful for genomic selection in biological shape.

View Article: PubMed Central - PubMed

Affiliation: Department of Agricultural and Environmental Biology, Graduate School of Agricultural and Life Sciences, University of Tokyo, Bunkyo, Tokyo, Japan.

ABSTRACT
Shape is an important morphological characteristic both in animals and plants. In the present study, we examined a method for predicting biological contour shapes based on genome-wide marker polymorphisms. The method is expected to contribute to the acceleration of genetic improvement of biological shape via genomic selection. Grain shape variation observed in rice (Oryza sativa L.) germplasms was delineated using elliptic Fourier descriptors (EFDs), and was predicted based on genome-wide single nucleotide polymorphism (SNP) genotypes. We applied four methods including kernel PLS (KPLS) regression for building a prediction model of grain shape, and compared the accuracy of the methods via cross-validation. We analyzed multiple datasets that differed in marker density and sample size. Datasets with larger sample size and higher marker density showed higher accuracy. Among the four methods, KPLS showed the highest accuracy. Although KPLS and ridge regression (RR) had equivalent accuracy in a single dataset, the result suggested the potential of KPLS for the prediction of high-dimensional EFDs. Ordinary PLS, however, was less accurate than RR in all datasets, suggesting that the use of a non-linear kernel was necessary for accurate prediction using the PLS method. Rice grain shape can be predicted accurately based on genome-wide SNP genotypes. The proposed method is expected to be useful for genomic selection in biological shape.

No MeSH data available.


Related in: MedlinePlus

Prediction accuracy, Q2, of rice grain shape in datasets A (a), B (b), and C (c).Each boxplot corresponds to a single method applied to a single dataset, and represents the range of Q2 values obtained in the 10 replications of the ten-fold cross-validation. Red asterisks denote the Q2 values obtained in the leave-one-out cross-validation.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4380318&req=5

pone.0120610.g003: Prediction accuracy, Q2, of rice grain shape in datasets A (a), B (b), and C (c).Each boxplot corresponds to a single method applied to a single dataset, and represents the range of Q2 values obtained in the 10 replications of the ten-fold cross-validation. Red asterisks denote the Q2 values obtained in the leave-one-out cross-validation.

Mentions: For each of datasets A, B and C, we built a prediction model for rice grain shape using one of four different methods (i.e., RR, KRR, PLS, and KPLS), and evaluated the prediction accuracy of the models based on the Q2 statistic (Fig 3). Based on the cross-validation of both types (i.e., ten-fold and leave-one-out cross-validation), the Q2 showed the lowest values in dataset A among the datasets in all four methods. In datasets B and C, Q2 showed similar values. In method-wise comparison, Q2 was larger in dataset C than in dataset B, except for KRR. The Q2 calculated via leave-one-out cross-validation was larger than the median of Q2 in the 10 replications of ten-fold cross-validation, but it fell within a range of a variation observed in the 10 replications. The underestimation of Q2 in the ten-fold cross-validation (or the overestimation of Q2 in the leave-one-out cross-validation) was more pronounced in dataset A than in dataset B or C.


Genomic prediction of biological shape: elliptic Fourier analysis and kernel partial least squares (PLS) regression applied to grain shape prediction in rice (Oryza sativa L.).

Iwata H, Ebana K, Uga Y, Hayashi T - PLoS ONE (2015)

Prediction accuracy, Q2, of rice grain shape in datasets A (a), B (b), and C (c).Each boxplot corresponds to a single method applied to a single dataset, and represents the range of Q2 values obtained in the 10 replications of the ten-fold cross-validation. Red asterisks denote the Q2 values obtained in the leave-one-out cross-validation.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4380318&req=5

pone.0120610.g003: Prediction accuracy, Q2, of rice grain shape in datasets A (a), B (b), and C (c).Each boxplot corresponds to a single method applied to a single dataset, and represents the range of Q2 values obtained in the 10 replications of the ten-fold cross-validation. Red asterisks denote the Q2 values obtained in the leave-one-out cross-validation.
Mentions: For each of datasets A, B and C, we built a prediction model for rice grain shape using one of four different methods (i.e., RR, KRR, PLS, and KPLS), and evaluated the prediction accuracy of the models based on the Q2 statistic (Fig 3). Based on the cross-validation of both types (i.e., ten-fold and leave-one-out cross-validation), the Q2 showed the lowest values in dataset A among the datasets in all four methods. In datasets B and C, Q2 showed similar values. In method-wise comparison, Q2 was larger in dataset C than in dataset B, except for KRR. The Q2 calculated via leave-one-out cross-validation was larger than the median of Q2 in the 10 replications of ten-fold cross-validation, but it fell within a range of a variation observed in the 10 replications. The underestimation of Q2 in the ten-fold cross-validation (or the overestimation of Q2 in the leave-one-out cross-validation) was more pronounced in dataset A than in dataset B or C.

Bottom Line: Among the four methods, KPLS showed the highest accuracy.Ordinary PLS, however, was less accurate than RR in all datasets, suggesting that the use of a non-linear kernel was necessary for accurate prediction using the PLS method.The proposed method is expected to be useful for genomic selection in biological shape.

View Article: PubMed Central - PubMed

Affiliation: Department of Agricultural and Environmental Biology, Graduate School of Agricultural and Life Sciences, University of Tokyo, Bunkyo, Tokyo, Japan.

ABSTRACT
Shape is an important morphological characteristic both in animals and plants. In the present study, we examined a method for predicting biological contour shapes based on genome-wide marker polymorphisms. The method is expected to contribute to the acceleration of genetic improvement of biological shape via genomic selection. Grain shape variation observed in rice (Oryza sativa L.) germplasms was delineated using elliptic Fourier descriptors (EFDs), and was predicted based on genome-wide single nucleotide polymorphism (SNP) genotypes. We applied four methods including kernel PLS (KPLS) regression for building a prediction model of grain shape, and compared the accuracy of the methods via cross-validation. We analyzed multiple datasets that differed in marker density and sample size. Datasets with larger sample size and higher marker density showed higher accuracy. Among the four methods, KPLS showed the highest accuracy. Although KPLS and ridge regression (RR) had equivalent accuracy in a single dataset, the result suggested the potential of KPLS for the prediction of high-dimensional EFDs. Ordinary PLS, however, was less accurate than RR in all datasets, suggesting that the use of a non-linear kernel was necessary for accurate prediction using the PLS method. Rice grain shape can be predicted accurately based on genome-wide SNP genotypes. The proposed method is expected to be useful for genomic selection in biological shape.

No MeSH data available.


Related in: MedlinePlus