Limits...
Application of LogitBoost Classifier for Traceability Using SNP Chip Data.

Kim K, Seo M, Kang H, Cho S, Kim H, Seo KS - PLoS ONE (2015)

Bottom Line: Consumer attention to food safety has increased rapidly due to animal-related diseases; therefore, it is important to identify their places of origin (POO) for safety purposes.Specifically, a greater level of accuracy was observed when a higher kinship-based cutoff was employed.These results demonstrated the applicability of a machine learning-based approach using SNP chip data for practical traceability.

View Article: PubMed Central - PubMed

Affiliation: Department of Agricultural Biotechnology and Research Institute for Agriculture and Life Sciences, Seoul National University, Seoul 151-921, Republic of Korea; C&K Genomics Inc., 514 Main Bldg., Seoul National University Research Park, San 4-2 Bongcheon-dong, Gwanak-gu, Seoul 151-919, Republic of Korea.

ABSTRACT
Consumer attention to food safety has increased rapidly due to animal-related diseases; therefore, it is important to identify their places of origin (POO) for safety purposes. However, only a few studies have addressed this issue and focused on machine learning-based approaches. In the present study, classification analyses were performed using a customized SNP chip for POO prediction. To accomplish this, 4,122 pigs originating from 104 farms were genotyped using the SNP chip. Several factors were considered to establish the best prediction model based on these data. We also assessed the applicability of the suggested model using a kinship coefficient-filtering approach. Our results showed that the LogitBoost-based prediction model outperformed other classifiers in terms of classification performance under most conditions. Specifically, a greater level of accuracy was observed when a higher kinship-based cutoff was employed. These results demonstrated the applicability of a machine learning-based approach using SNP chip data for practical traceability.

No MeSH data available.


Related in: MedlinePlus

Scatter plots for four subsets with different kinship coefficient criteria (X-axis: Eigen vector 1 and Y-axis: Eigen vector 2).Scatter plots were generated by PCA using GCTA [24]. Each point represents an individual animal and is colored based on the farm information. When the kinship cutoff increased, each farm was more clearly distinguishable.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4593556&req=5

pone.0139685.g002: Scatter plots for four subsets with different kinship coefficient criteria (X-axis: Eigen vector 1 and Y-axis: Eigen vector 2).Scatter plots were generated by PCA using GCTA [24]. Each point represents an individual animal and is colored based on the farm information. When the kinship cutoff increased, each farm was more clearly distinguishable.

Mentions: In the present study, we applied three representative multiclass classifiers to four subsets of SNP data based on kinship-based filtering. In addition, 2 (top-down and bottom-up) × 3 (LogitBoost, SVM, and KNN) wrapper-based feature selection methods were used to generate the best prediction model for traceability. The entire pipeline for data processing including classification is presented as a schematic diagram in Fig 1. Specific elements (classifier, feature subset, and kinship coefficient) were expected to be directly associated with prediction accuracy. We investigated the influence of these elements by calculating the prediction accuracy from various points of view. First, we determined how distinguished the individual animals were according to the farms of origin using four subsets based on kinship coefficient-based filtering. As shown in Fig 2, the four subsets established based on the cutoff criteria (mean of the kinship within a farm ≥ 0.00, 0.05, 0.10, and 0.15, respectively) were visualized by PCA. As the cutoff criterion increased, greater segregation among farms was observed. These findings imply that traceability prediction could be performed when individuals on one farm have highly similar genetic information, which was expected. Using the PCA, we observed subsets with different numbers of samples and farms depending on the cutoff criterion. Therefore, these figures should be interpreted with caution in terms of bias due to the smaller number of classes, larger sample size, and larger number of features, which generally improve accuracy when classification is performed.


Application of LogitBoost Classifier for Traceability Using SNP Chip Data.

Kim K, Seo M, Kang H, Cho S, Kim H, Seo KS - PLoS ONE (2015)

Scatter plots for four subsets with different kinship coefficient criteria (X-axis: Eigen vector 1 and Y-axis: Eigen vector 2).Scatter plots were generated by PCA using GCTA [24]. Each point represents an individual animal and is colored based on the farm information. When the kinship cutoff increased, each farm was more clearly distinguishable.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4593556&req=5

pone.0139685.g002: Scatter plots for four subsets with different kinship coefficient criteria (X-axis: Eigen vector 1 and Y-axis: Eigen vector 2).Scatter plots were generated by PCA using GCTA [24]. Each point represents an individual animal and is colored based on the farm information. When the kinship cutoff increased, each farm was more clearly distinguishable.
Mentions: In the present study, we applied three representative multiclass classifiers to four subsets of SNP data based on kinship-based filtering. In addition, 2 (top-down and bottom-up) × 3 (LogitBoost, SVM, and KNN) wrapper-based feature selection methods were used to generate the best prediction model for traceability. The entire pipeline for data processing including classification is presented as a schematic diagram in Fig 1. Specific elements (classifier, feature subset, and kinship coefficient) were expected to be directly associated with prediction accuracy. We investigated the influence of these elements by calculating the prediction accuracy from various points of view. First, we determined how distinguished the individual animals were according to the farms of origin using four subsets based on kinship coefficient-based filtering. As shown in Fig 2, the four subsets established based on the cutoff criteria (mean of the kinship within a farm ≥ 0.00, 0.05, 0.10, and 0.15, respectively) were visualized by PCA. As the cutoff criterion increased, greater segregation among farms was observed. These findings imply that traceability prediction could be performed when individuals on one farm have highly similar genetic information, which was expected. Using the PCA, we observed subsets with different numbers of samples and farms depending on the cutoff criterion. Therefore, these figures should be interpreted with caution in terms of bias due to the smaller number of classes, larger sample size, and larger number of features, which generally improve accuracy when classification is performed.

Bottom Line: Consumer attention to food safety has increased rapidly due to animal-related diseases; therefore, it is important to identify their places of origin (POO) for safety purposes.Specifically, a greater level of accuracy was observed when a higher kinship-based cutoff was employed.These results demonstrated the applicability of a machine learning-based approach using SNP chip data for practical traceability.

View Article: PubMed Central - PubMed

Affiliation: Department of Agricultural Biotechnology and Research Institute for Agriculture and Life Sciences, Seoul National University, Seoul 151-921, Republic of Korea; C&K Genomics Inc., 514 Main Bldg., Seoul National University Research Park, San 4-2 Bongcheon-dong, Gwanak-gu, Seoul 151-919, Republic of Korea.

ABSTRACT
Consumer attention to food safety has increased rapidly due to animal-related diseases; therefore, it is important to identify their places of origin (POO) for safety purposes. However, only a few studies have addressed this issue and focused on machine learning-based approaches. In the present study, classification analyses were performed using a customized SNP chip for POO prediction. To accomplish this, 4,122 pigs originating from 104 farms were genotyped using the SNP chip. Several factors were considered to establish the best prediction model based on these data. We also assessed the applicability of the suggested model using a kinship coefficient-filtering approach. Our results showed that the LogitBoost-based prediction model outperformed other classifiers in terms of classification performance under most conditions. Specifically, a greater level of accuracy was observed when a higher kinship-based cutoff was employed. These results demonstrated the applicability of a machine learning-based approach using SNP chip data for practical traceability.

No MeSH data available.


Related in: MedlinePlus