Limits...
Mapping transcription mechanisms from multimodal genomic data.

Chang HH, McGeachie M, Alterovitz G, Ramoni MF - BMC Bioinformatics (2010)

Bottom Line: We use information theory to simultaneously interrogate SNP and gene expression data, resulting in a Transcriptional Information Map (TIM) which captures the network of transcriptional information that links genetic variations, gene expression and regulatory mechanisms.The application on a dataset of leukemia patients identifies eQTLs in the regions of the GART, PCP4, DSCAM, and RIPK4 genes that regulate ADAMTS1, a known leukemia correlate.The application of our method to the leukemia study explains how genetic variants and gene expression are linked to leukemia.

View Article: PubMed Central - HTML - PubMed

Affiliation: Children's Hospital Informatics Program, Harvard-MIT Division of Health Sciences and Technology, Harvard Medical School, Boston, Massachusetts, USA. hsun-hsien.chang@childrens.harvard.edu

ABSTRACT

Background: Identification of expression quantitative trait loci (eQTLs) is an emerging area in genomic study. The task requires an integrated analysis of genome-wide single nucleotide polymorphism (SNP) data and gene expression data, raising a new computational challenge due to the tremendous size of data.

Results: We develop a method to identify eQTLs. The method represents eQTLs as information flux between genetic variants and transcripts. We use information theory to simultaneously interrogate SNP and gene expression data, resulting in a Transcriptional Information Map (TIM) which captures the network of transcriptional information that links genetic variations, gene expression and regulatory mechanisms. These maps are able to identify both cis- and trans- regulating eQTLs. The application on a dataset of leukemia patients identifies eQTLs in the regions of the GART, PCP4, DSCAM, and RIPK4 genes that regulate ADAMTS1, a known leukemia correlate.

Conclusions: The information theory approach presented in this paper is able to infer the dependence networks between SNPs and transcripts, which in turn can identify cis- and trans-eQTLs. The application of our method to the leukemia study explains how genetic variants and gene expression are linked to leukemia.

Show MeSH

Related in: MedlinePlus

Illustration of mutual information between discrete and continuous variables. (a) The expression level of gene Y is modulated by a SNP X. The distribution of Y alone is a Gaussian with entropy H(Y)=2.61. When conditional on SNP X, the gene Y is a bimodal Gaussian whose mutual information with SNP X is H(Y:X)=0.57. (b) The gene Y and SNP X are independent. Although gene Y follows a Gaussian distribution and its entropy is the same as the entropy in (a), its distribution conditional on SNP X remains a unimodal Gaussian and its mutual information with SNP X is H(Y:X)=0.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2967743&req=5

Figure 3: Illustration of mutual information between discrete and continuous variables. (a) The expression level of gene Y is modulated by a SNP X. The distribution of Y alone is a Gaussian with entropy H(Y)=2.61. When conditional on SNP X, the gene Y is a bimodal Gaussian whose mutual information with SNP X is H(Y:X)=0.57. (b) The gene Y and SNP X are independent. Although gene Y follows a Gaussian distribution and its entropy is the same as the entropy in (a), its distribution conditional on SNP X remains a unimodal Gaussian and its mutual information with SNP X is H(Y:X)=0.

Mentions: We provide an example to illustrate mutual information between continuous and discrete variables. Figure 3(a) shows an example where expression level of gene Y is modulated by a SNP X. The distribution of Y alone is a Gaussian with entropy H(Y)=2.61. When conditional on SNP X, the gene Y is a bimodal Gaussian whose mutual information with SNP X is H(Y:X)=0.57. In contrast, Figure 3(b) shows the other example where gene Y and SNP X are independent. Although gene Y follows a Gaussian distribution and its entropy is the same as the preceding example, its distribution conditional on SNP X remains unimodal and its mutual information with SNP X is H(Y:X)=0.


Mapping transcription mechanisms from multimodal genomic data.

Chang HH, McGeachie M, Alterovitz G, Ramoni MF - BMC Bioinformatics (2010)

Illustration of mutual information between discrete and continuous variables. (a) The expression level of gene Y is modulated by a SNP X. The distribution of Y alone is a Gaussian with entropy H(Y)=2.61. When conditional on SNP X, the gene Y is a bimodal Gaussian whose mutual information with SNP X is H(Y:X)=0.57. (b) The gene Y and SNP X are independent. Although gene Y follows a Gaussian distribution and its entropy is the same as the entropy in (a), its distribution conditional on SNP X remains a unimodal Gaussian and its mutual information with SNP X is H(Y:X)=0.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2967743&req=5

Figure 3: Illustration of mutual information between discrete and continuous variables. (a) The expression level of gene Y is modulated by a SNP X. The distribution of Y alone is a Gaussian with entropy H(Y)=2.61. When conditional on SNP X, the gene Y is a bimodal Gaussian whose mutual information with SNP X is H(Y:X)=0.57. (b) The gene Y and SNP X are independent. Although gene Y follows a Gaussian distribution and its entropy is the same as the entropy in (a), its distribution conditional on SNP X remains a unimodal Gaussian and its mutual information with SNP X is H(Y:X)=0.
Mentions: We provide an example to illustrate mutual information between continuous and discrete variables. Figure 3(a) shows an example where expression level of gene Y is modulated by a SNP X. The distribution of Y alone is a Gaussian with entropy H(Y)=2.61. When conditional on SNP X, the gene Y is a bimodal Gaussian whose mutual information with SNP X is H(Y:X)=0.57. In contrast, Figure 3(b) shows the other example where gene Y and SNP X are independent. Although gene Y follows a Gaussian distribution and its entropy is the same as the preceding example, its distribution conditional on SNP X remains unimodal and its mutual information with SNP X is H(Y:X)=0.

Bottom Line: We use information theory to simultaneously interrogate SNP and gene expression data, resulting in a Transcriptional Information Map (TIM) which captures the network of transcriptional information that links genetic variations, gene expression and regulatory mechanisms.The application on a dataset of leukemia patients identifies eQTLs in the regions of the GART, PCP4, DSCAM, and RIPK4 genes that regulate ADAMTS1, a known leukemia correlate.The application of our method to the leukemia study explains how genetic variants and gene expression are linked to leukemia.

View Article: PubMed Central - HTML - PubMed

Affiliation: Children's Hospital Informatics Program, Harvard-MIT Division of Health Sciences and Technology, Harvard Medical School, Boston, Massachusetts, USA. hsun-hsien.chang@childrens.harvard.edu

ABSTRACT

Background: Identification of expression quantitative trait loci (eQTLs) is an emerging area in genomic study. The task requires an integrated analysis of genome-wide single nucleotide polymorphism (SNP) data and gene expression data, raising a new computational challenge due to the tremendous size of data.

Results: We develop a method to identify eQTLs. The method represents eQTLs as information flux between genetic variants and transcripts. We use information theory to simultaneously interrogate SNP and gene expression data, resulting in a Transcriptional Information Map (TIM) which captures the network of transcriptional information that links genetic variations, gene expression and regulatory mechanisms. These maps are able to identify both cis- and trans- regulating eQTLs. The application on a dataset of leukemia patients identifies eQTLs in the regions of the GART, PCP4, DSCAM, and RIPK4 genes that regulate ADAMTS1, a known leukemia correlate.

Conclusions: The information theory approach presented in this paper is able to infer the dependence networks between SNPs and transcripts, which in turn can identify cis- and trans-eQTLs. The application of our method to the leukemia study explains how genetic variants and gene expression are linked to leukemia.

Show MeSH
Related in: MedlinePlus