Limits...
Genotype-based test in mapping cis-regulatory variants from allele-specific expression data.

Lefebvre JF, Vello E, Ge B, Montgomery SB, Dermitzakis ET, Pastinen T, Labuda D - PLoS ONE (2012)

Bottom Line: In this study, we introduce a genotype-based mapping test that does not require haplotype-phase inference to locate regulatory regions.The genotype-based test performed equally well with the experimental AI datasets, either from genome-wide cDNA hybridization arrays or from RNA sequencing.By avoiding the need of haplotype inference, the genotype-based test will suit AI analyses in population samples of unknown haplotype structure and will additionally facilitate the identification of cis-regulatory variants that are located far away from the regulated transcript.

View Article: PubMed Central - PubMed

Affiliation: Centre de Recherche du CHU Sainte-Justine, Université de Montréal, Montréal, Québec, Canada.

ABSTRACT
Identifying and understanding the impact of gene regulatory variation is of considerable importance in evolutionary and medical genetics; such variants are thought to be responsible for human-specific adaptation and to have an important role in genetic disease. Regulatory variation in cis is readily detected in individuals showing uneven expression of a transcript from its two allelic copies, an observation referred to as allelic imbalance (AI). Identifying individuals exhibiting AI allows mapping of regulatory DNA regions and the potential to identify the underlying causal genetic variant(s). However, existing mapping methods require knowledge of the haplotypes, which make them sensitive to phasing errors. In this study, we introduce a genotype-based mapping test that does not require haplotype-phase inference to locate regulatory regions. The test relies on partitioning genotypes of individuals exhibiting AI and those not expressing AI in a 2×3 contingency table. The performance of this test to detect linkage disequilibrium (LD) between a potential regulatory site and a SNP located in this region was examined by analyzing the simulated and the empirical AI datasets. In simulation experiments, the genotype-based test outperforms the haplotype-based tests with the increasing distance separating the regulatory region from its regulated transcript. The genotype-based test performed equally well with the experimental AI datasets, either from genome-wide cDNA hybridization arrays or from RNA sequencing. By avoiding the need of haplotype inference, the genotype-based test will suit AI analyses in population samples of unknown haplotype structure and will additionally facilitate the identification of cis-regulatory variants that are located far away from the regulated transcript.

Show MeSH

Related in: MedlinePlus

Mapping regulatory sites for PTGER4.Plots of p-values for HapMap2 SNPs using binomial test (A), linear regression test (B) and contingency test (C). Vertical black lines identify SNPs that were used as informative markers within the transcript and the green horizontal line corresponds to the analyzed transcript. The linkage disequilibrium triangle and recombination intensity profile of the population recombination rate (ρ/kb estimated by InfRec [22]) are shown in (D), where, black lines connect SNPs distributed according to sequence position (upper part) with their position in the LD triangle and vertical green lines delimit the size of the analyzed transcript. Arrow on the top indicates transcription direction.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3369843&req=5

pone-0038667-g008: Mapping regulatory sites for PTGER4.Plots of p-values for HapMap2 SNPs using binomial test (A), linear regression test (B) and contingency test (C). Vertical black lines identify SNPs that were used as informative markers within the transcript and the green horizontal line corresponds to the analyzed transcript. The linkage disequilibrium triangle and recombination intensity profile of the population recombination rate (ρ/kb estimated by InfRec [22]) are shown in (D), where, black lines connect SNPs distributed according to sequence position (upper part) with their position in the LD triangle and vertical green lines delimit the size of the analyzed transcript. Arrow on the top indicates transcription direction.

Mentions: Figure 7A shows the contingency test analysis of the LRRIQ3 AI data from Ge et al. [6] including SNPs from all autosomes. A similar analysis of the TAPBP transcript based on AI data from Montgomery et al. [7] is shown Figure 7B. It is repeated in Figure 7C using the full sequence information of chromosome 6 obtained from the 1000 genomes project [21]. In both loci the analysis revealed single candidate regulatory region overlapping the examined transcript (Figures S8 and S9). Note that in Figure 7A, the second minor peak on chromosome 15 is an artifact caused by coincidental concentration of unlinked singleton SNPs. As expected, the contingency test becomes especially practical when looking for regulatory sites that are far from the affected gene. A classic example is PTGER4 [6], [14] with the regulatory region located about 200 KB upstream of its transcription start site. Here, the linear regression test, which performs the best in terms of log(1/p), the binomial and the contingency test all point to four AI-associated sites (rs7720838 at map position 40522653 bp; rs7725052 at 40523027 bp; rs9283753 at 40526366 bp; rs10440635 at 40526547 bp – seen on the upper left in Figure 8A and 8B). The rs7720838 SNP, which was already reported previously (Table 5) is in complete LD with the three others. In turn, in the case of TTC39b, it is only the contingency test that highlights a potential regulatory region at about 600 KB downstream from the gene (Figure 9C; see also Table 5). The failure of the binomial and linear regression test (Figure 9A and 9B) to pinpoint the same region as a regulatory candidate is presumably related to a greater genetic distance separating it from the regulated TTC39b transcripts than in the case of PTGER4 (600 vs. 200 Kb and even greater difference in genetic distances when comparing ρ, the population recombination rate intensities in Figures 8D and 9D). Note, however, that in the same time, both haplotype based tests reveal a number of significant SNPs (one highly significant, p = 2.5×10−7, in the case of linear regression) among those within the transcript itself and used as informative markers for the detection of AI.


Genotype-based test in mapping cis-regulatory variants from allele-specific expression data.

Lefebvre JF, Vello E, Ge B, Montgomery SB, Dermitzakis ET, Pastinen T, Labuda D - PLoS ONE (2012)

Mapping regulatory sites for PTGER4.Plots of p-values for HapMap2 SNPs using binomial test (A), linear regression test (B) and contingency test (C). Vertical black lines identify SNPs that were used as informative markers within the transcript and the green horizontal line corresponds to the analyzed transcript. The linkage disequilibrium triangle and recombination intensity profile of the population recombination rate (ρ/kb estimated by InfRec [22]) are shown in (D), where, black lines connect SNPs distributed according to sequence position (upper part) with their position in the LD triangle and vertical green lines delimit the size of the analyzed transcript. Arrow on the top indicates transcription direction.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3369843&req=5

pone-0038667-g008: Mapping regulatory sites for PTGER4.Plots of p-values for HapMap2 SNPs using binomial test (A), linear regression test (B) and contingency test (C). Vertical black lines identify SNPs that were used as informative markers within the transcript and the green horizontal line corresponds to the analyzed transcript. The linkage disequilibrium triangle and recombination intensity profile of the population recombination rate (ρ/kb estimated by InfRec [22]) are shown in (D), where, black lines connect SNPs distributed according to sequence position (upper part) with their position in the LD triangle and vertical green lines delimit the size of the analyzed transcript. Arrow on the top indicates transcription direction.
Mentions: Figure 7A shows the contingency test analysis of the LRRIQ3 AI data from Ge et al. [6] including SNPs from all autosomes. A similar analysis of the TAPBP transcript based on AI data from Montgomery et al. [7] is shown Figure 7B. It is repeated in Figure 7C using the full sequence information of chromosome 6 obtained from the 1000 genomes project [21]. In both loci the analysis revealed single candidate regulatory region overlapping the examined transcript (Figures S8 and S9). Note that in Figure 7A, the second minor peak on chromosome 15 is an artifact caused by coincidental concentration of unlinked singleton SNPs. As expected, the contingency test becomes especially practical when looking for regulatory sites that are far from the affected gene. A classic example is PTGER4 [6], [14] with the regulatory region located about 200 KB upstream of its transcription start site. Here, the linear regression test, which performs the best in terms of log(1/p), the binomial and the contingency test all point to four AI-associated sites (rs7720838 at map position 40522653 bp; rs7725052 at 40523027 bp; rs9283753 at 40526366 bp; rs10440635 at 40526547 bp – seen on the upper left in Figure 8A and 8B). The rs7720838 SNP, which was already reported previously (Table 5) is in complete LD with the three others. In turn, in the case of TTC39b, it is only the contingency test that highlights a potential regulatory region at about 600 KB downstream from the gene (Figure 9C; see also Table 5). The failure of the binomial and linear regression test (Figure 9A and 9B) to pinpoint the same region as a regulatory candidate is presumably related to a greater genetic distance separating it from the regulated TTC39b transcripts than in the case of PTGER4 (600 vs. 200 Kb and even greater difference in genetic distances when comparing ρ, the population recombination rate intensities in Figures 8D and 9D). Note, however, that in the same time, both haplotype based tests reveal a number of significant SNPs (one highly significant, p = 2.5×10−7, in the case of linear regression) among those within the transcript itself and used as informative markers for the detection of AI.

Bottom Line: In this study, we introduce a genotype-based mapping test that does not require haplotype-phase inference to locate regulatory regions.The genotype-based test performed equally well with the experimental AI datasets, either from genome-wide cDNA hybridization arrays or from RNA sequencing.By avoiding the need of haplotype inference, the genotype-based test will suit AI analyses in population samples of unknown haplotype structure and will additionally facilitate the identification of cis-regulatory variants that are located far away from the regulated transcript.

View Article: PubMed Central - PubMed

Affiliation: Centre de Recherche du CHU Sainte-Justine, Université de Montréal, Montréal, Québec, Canada.

ABSTRACT
Identifying and understanding the impact of gene regulatory variation is of considerable importance in evolutionary and medical genetics; such variants are thought to be responsible for human-specific adaptation and to have an important role in genetic disease. Regulatory variation in cis is readily detected in individuals showing uneven expression of a transcript from its two allelic copies, an observation referred to as allelic imbalance (AI). Identifying individuals exhibiting AI allows mapping of regulatory DNA regions and the potential to identify the underlying causal genetic variant(s). However, existing mapping methods require knowledge of the haplotypes, which make them sensitive to phasing errors. In this study, we introduce a genotype-based mapping test that does not require haplotype-phase inference to locate regulatory regions. The test relies on partitioning genotypes of individuals exhibiting AI and those not expressing AI in a 2×3 contingency table. The performance of this test to detect linkage disequilibrium (LD) between a potential regulatory site and a SNP located in this region was examined by analyzing the simulated and the empirical AI datasets. In simulation experiments, the genotype-based test outperforms the haplotype-based tests with the increasing distance separating the regulatory region from its regulated transcript. The genotype-based test performed equally well with the experimental AI datasets, either from genome-wide cDNA hybridization arrays or from RNA sequencing. By avoiding the need of haplotype inference, the genotype-based test will suit AI analyses in population samples of unknown haplotype structure and will additionally facilitate the identification of cis-regulatory variants that are located far away from the regulated transcript.

Show MeSH
Related in: MedlinePlus