Limits...
Gene-based bin analysis of genome-wide association studies.

Omont N, Forner K, Lamarine M, Martin G, Képès F, Wojcik J - BMC Proc (2008)

Bottom Line: However new analytical methods have to be developed to face the problems induced by this data scale-up, such as statistical multiple testing, data quality control and computational tractability.We present a novel method to analyze genome-wide association studies results.The method practically overcomes the scale-up problems and permits to identify new putative regions statistically associated with the disease.

View Article: PubMed Central - HTML - PubMed

Affiliation: Merck Serono International S,A, 9 chemin des Mines, 1202 Geneva, Switzerland.

ABSTRACT

Background: With the improvement of genotyping technologies and the exponentially growing number of available markers, case-control genome-wide association studies promise to be a key tool for investigation of complex diseases. However new analytical methods have to be developed to face the problems induced by this data scale-up, such as statistical multiple testing, data quality control and computational tractability.

Results: We present a novel method to analyze genome-wide association studies results. The algorithm is based on a Bayesian model that integrates genotyping errors and genomic structure dependencies. p-values are assigned to genomic regions termed bins, which are defined from a gene-biased partitioning of the genome, and the false-discovery rate is estimated. We have applied this algorithm to data coming from three genome-wide association studies of Multiple Sclerosis.

Conclusion: The method practically overcomes the scale-up problems and permits to identify new putative regions statistically associated with the disease.

No MeSH data available.


Related in: MedlinePlus

Representation of a bin containing two genes and Jb markers.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2654974&req=5

Figure 1: Representation of a bin containing two genes and Jb markers.

Mentions: Bins are defined on DNA from protein genes as defined in the version 35.35 of EnsEMBL [9] of the human DNA sequence. The basic region of a gene lie from the beginning of its first exon to the end of its last exon. Overlapping genes are clustered in the same bin. If two consecutive genes or clusters of overlapping genes are separated by less than 200 kbp, the bin limit is fixed in the middle of the interval. Otherwise, the limit of the upstream bin is set 50 kbp downstream its last exon, the limit of the downstream bin is set 50 kbp upstream its first exon, and a special bin corresponding to a desert is created in between the two bins. With these rules, desert bins have a minimum length of 100 kbp (Figure 1).


Gene-based bin analysis of genome-wide association studies.

Omont N, Forner K, Lamarine M, Martin G, Képès F, Wojcik J - BMC Proc (2008)

Representation of a bin containing two genes and Jb markers.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2654974&req=5

Figure 1: Representation of a bin containing two genes and Jb markers.
Mentions: Bins are defined on DNA from protein genes as defined in the version 35.35 of EnsEMBL [9] of the human DNA sequence. The basic region of a gene lie from the beginning of its first exon to the end of its last exon. Overlapping genes are clustered in the same bin. If two consecutive genes or clusters of overlapping genes are separated by less than 200 kbp, the bin limit is fixed in the middle of the interval. Otherwise, the limit of the upstream bin is set 50 kbp downstream its last exon, the limit of the downstream bin is set 50 kbp upstream its first exon, and a special bin corresponding to a desert is created in between the two bins. With these rules, desert bins have a minimum length of 100 kbp (Figure 1).

Bottom Line: However new analytical methods have to be developed to face the problems induced by this data scale-up, such as statistical multiple testing, data quality control and computational tractability.We present a novel method to analyze genome-wide association studies results.The method practically overcomes the scale-up problems and permits to identify new putative regions statistically associated with the disease.

View Article: PubMed Central - HTML - PubMed

Affiliation: Merck Serono International S,A, 9 chemin des Mines, 1202 Geneva, Switzerland.

ABSTRACT

Background: With the improvement of genotyping technologies and the exponentially growing number of available markers, case-control genome-wide association studies promise to be a key tool for investigation of complex diseases. However new analytical methods have to be developed to face the problems induced by this data scale-up, such as statistical multiple testing, data quality control and computational tractability.

Results: We present a novel method to analyze genome-wide association studies results. The algorithm is based on a Bayesian model that integrates genotyping errors and genomic structure dependencies. p-values are assigned to genomic regions termed bins, which are defined from a gene-biased partitioning of the genome, and the false-discovery rate is estimated. We have applied this algorithm to data coming from three genome-wide association studies of Multiple Sclerosis.

Conclusion: The method practically overcomes the scale-up problems and permits to identify new putative regions statistically associated with the disease.

No MeSH data available.


Related in: MedlinePlus