Limits...
Linkage disequilibrium based genotype calling from low-coverage shotgun sequencing reads.

Duitama J, Kennedy J, Dinakar S, Hernández Y, Wu Y, Măndoiu II - BMC Bioinformatics (2011)

Bottom Line: Recent technology advances have enabled sequencing of individual genomes, promising to revolutionize biomedical research.However, deep sequencing remains more expensive than microarrays for performing whole-genome SNP genotyping.In this paper we introduce a new multi-locus statistical model and computationally efficient genotype calling algorithms that integrate shotgun sequencing data with linkage disequilibrium (LD) information extracted from reference population panels such as Hapmap or the 1000 genomes project.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science & Engineering, University of Connecticut, 371 Fairfield Rd, Unit 2155, Storrs, CT 06269-2155, USA. jduitama@engr.uconn.edu

ABSTRACT

Background: Recent technology advances have enabled sequencing of individual genomes, promising to revolutionize biomedical research. However, deep sequencing remains more expensive than microarrays for performing whole-genome SNP genotyping.

Results: In this paper we introduce a new multi-locus statistical model and computationally efficient genotype calling algorithms that integrate shotgun sequencing data with linkage disequilibrium (LD) information extracted from reference population panels such as Hapmap or the 1000 genomes project. Experiments on publicly available 454, Illumina, and ABI SOLiD sequencing datasets suggest that integration of LD information results in genotype calling accuracy comparable to that of microarray platforms from sequencing data of low-coverage. A software package implementing our algorithm, released under the GNU General Public License, is available at http://dna.engr.uconn.edu/software/GeneSeq/.

Conclusions: Integration of LD information leads to significant improvements in genotype calling accuracy compared to prior LD-oblivious methods, rendering low-coverage sequencing as a viable alternative to microarrays for conducting large-scale genome-wide association studies.

Show MeSH
HF-HMM model for multilocus genotype inference.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3044311&req=5

Figure 1: HF-HMM model for multilocus genotype inference.

Mentions: In this section we introduce a statistical model that allows us to integrate shotgun sequencing data and LD information in the inference of SNP genotypes. Our model, represented graphically in Fig. 1, can be thought of as a hierarchical factorial HMM (HF-HMM). Indeed, we use a distributed state (characteristic of factorial HMMs [26]) to exploit the independence between maternal and paternal chromosomes (implied by the assumption of random mating), while also employing a multilevel state representation as in hierarchical HMMs [27] to capture the structured nature of the data. This structure leads to a reduced number of model parameters and enables highly scalable inference algorithms.


Linkage disequilibrium based genotype calling from low-coverage shotgun sequencing reads.

Duitama J, Kennedy J, Dinakar S, Hernández Y, Wu Y, Măndoiu II - BMC Bioinformatics (2011)

HF-HMM model for multilocus genotype inference.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3044311&req=5

Figure 1: HF-HMM model for multilocus genotype inference.
Mentions: In this section we introduce a statistical model that allows us to integrate shotgun sequencing data and LD information in the inference of SNP genotypes. Our model, represented graphically in Fig. 1, can be thought of as a hierarchical factorial HMM (HF-HMM). Indeed, we use a distributed state (characteristic of factorial HMMs [26]) to exploit the independence between maternal and paternal chromosomes (implied by the assumption of random mating), while also employing a multilevel state representation as in hierarchical HMMs [27] to capture the structured nature of the data. This structure leads to a reduced number of model parameters and enables highly scalable inference algorithms.

Bottom Line: Recent technology advances have enabled sequencing of individual genomes, promising to revolutionize biomedical research.However, deep sequencing remains more expensive than microarrays for performing whole-genome SNP genotyping.In this paper we introduce a new multi-locus statistical model and computationally efficient genotype calling algorithms that integrate shotgun sequencing data with linkage disequilibrium (LD) information extracted from reference population panels such as Hapmap or the 1000 genomes project.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science & Engineering, University of Connecticut, 371 Fairfield Rd, Unit 2155, Storrs, CT 06269-2155, USA. jduitama@engr.uconn.edu

ABSTRACT

Background: Recent technology advances have enabled sequencing of individual genomes, promising to revolutionize biomedical research. However, deep sequencing remains more expensive than microarrays for performing whole-genome SNP genotyping.

Results: In this paper we introduce a new multi-locus statistical model and computationally efficient genotype calling algorithms that integrate shotgun sequencing data with linkage disequilibrium (LD) information extracted from reference population panels such as Hapmap or the 1000 genomes project. Experiments on publicly available 454, Illumina, and ABI SOLiD sequencing datasets suggest that integration of LD information results in genotype calling accuracy comparable to that of microarray platforms from sequencing data of low-coverage. A software package implementing our algorithm, released under the GNU General Public License, is available at http://dna.engr.uconn.edu/software/GeneSeq/.

Conclusions: Integration of LD information leads to significant improvements in genotype calling accuracy compared to prior LD-oblivious methods, rendering low-coverage sequencing as a viable alternative to microarrays for conducting large-scale genome-wide association studies.

Show MeSH