Limits...
Machine learning approach for pooled DNA sample calibration.

Hellicar AD, Rahman A, Smith DV, Henshall JM - BMC Bioinformatics (2015)

Bottom Line: The approach is tested on SNPs genotyped with the Sequenom iPLEX platform and compared to existing state of the art calibration methods.The new method is capable of reducing the mean square error in allele frequency to half that achievable with existing approaches.This paper demonstrates that improvements in pooled allele frequency estimates result if the genotyping platform is characterised at allele frequencies other than the homozygous and heterozygous cases.

View Article: PubMed Central - PubMed

Affiliation: CSIRO Computational Informatics, Castray Esplanade, Hobart, Australia. andrew.hellicar@csiro.au.

ABSTRACT

Background: Despite ongoing reduction in genotyping costs, genomic studies involving large numbers of species with low economic value (such as Black Tiger prawns) remain cost prohibitive. In this scenario DNA pooling is an attractive option to reduce genotyping costs. However, genotyping of pooled samples comprising DNA from many individuals is challenging due to the presence of errors that exceed the allele frequency quantisation size and therefore cannot be simply corrected by clustering techniques. The solution to the calibration problem is a correction to the allele frequency to mitigate errors incurred in the measurement process. We highlight the limitations of the existing calibration solutions such as the fact they impose assumptions on the variation between allele frequencies 0, 0.5, and 1.0, and address a limited set of error types. We propose a novel machine learning method to address the limitations identified.

Results: The approach is tested on SNPs genotyped with the Sequenom iPLEX platform and compared to existing state of the art calibration methods. The new method is capable of reducing the mean square error in allele frequency to half that achievable with existing approaches. Furthermore for the first time we demonstrate the importance of carefully considering the choice of training data when using calibration approaches built from pooled data.

Conclusion: This paper demonstrates that improvements in pooled allele frequency estimates result if the genotyping platform is characterised at allele frequencies other than the homozygous and heterozygous cases. Techniques capable of incorporating such information are described along with aspects of implementation.

No MeSH data available.


Related in: MedlinePlus

Polynomial calibration functions.(a) Examples of calibration functions for the heterozygous case (b) Distortion corrections for calibrations functions corresponding to E=0.2.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4495942&req=5

Fig1: Polynomial calibration functions.(a) Examples of calibration functions for the heterozygous case (b) Distortion corrections for calibrations functions corresponding to E=0.2.

Mentions: Examination of Eqs. 9, (10) and (11) show they satisfy conditions in (8). Example plots of polynomials and distortions D are given in Figure 1.Figure 1


Machine learning approach for pooled DNA sample calibration.

Hellicar AD, Rahman A, Smith DV, Henshall JM - BMC Bioinformatics (2015)

Polynomial calibration functions.(a) Examples of calibration functions for the heterozygous case (b) Distortion corrections for calibrations functions corresponding to E=0.2.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4495942&req=5

Fig1: Polynomial calibration functions.(a) Examples of calibration functions for the heterozygous case (b) Distortion corrections for calibrations functions corresponding to E=0.2.
Mentions: Examination of Eqs. 9, (10) and (11) show they satisfy conditions in (8). Example plots of polynomials and distortions D are given in Figure 1.Figure 1

Bottom Line: The approach is tested on SNPs genotyped with the Sequenom iPLEX platform and compared to existing state of the art calibration methods.The new method is capable of reducing the mean square error in allele frequency to half that achievable with existing approaches.This paper demonstrates that improvements in pooled allele frequency estimates result if the genotyping platform is characterised at allele frequencies other than the homozygous and heterozygous cases.

View Article: PubMed Central - PubMed

Affiliation: CSIRO Computational Informatics, Castray Esplanade, Hobart, Australia. andrew.hellicar@csiro.au.

ABSTRACT

Background: Despite ongoing reduction in genotyping costs, genomic studies involving large numbers of species with low economic value (such as Black Tiger prawns) remain cost prohibitive. In this scenario DNA pooling is an attractive option to reduce genotyping costs. However, genotyping of pooled samples comprising DNA from many individuals is challenging due to the presence of errors that exceed the allele frequency quantisation size and therefore cannot be simply corrected by clustering techniques. The solution to the calibration problem is a correction to the allele frequency to mitigate errors incurred in the measurement process. We highlight the limitations of the existing calibration solutions such as the fact they impose assumptions on the variation between allele frequencies 0, 0.5, and 1.0, and address a limited set of error types. We propose a novel machine learning method to address the limitations identified.

Results: The approach is tested on SNPs genotyped with the Sequenom iPLEX platform and compared to existing state of the art calibration methods. The new method is capable of reducing the mean square error in allele frequency to half that achievable with existing approaches. Furthermore for the first time we demonstrate the importance of carefully considering the choice of training data when using calibration approaches built from pooled data.

Conclusion: This paper demonstrates that improvements in pooled allele frequency estimates result if the genotyping platform is characterised at allele frequencies other than the homozygous and heterozygous cases. Techniques capable of incorporating such information are described along with aspects of implementation.

No MeSH data available.


Related in: MedlinePlus