Limits...
On the identification of potential regulatory variants within genome wide association candidate SNP sets.

Chen CY, Chang IS, Hsiung CA, Wasserman WW - BMC Med Genomics (2014)

Bottom Line: Current methods focus on the discovery of single nucleotide polymorphisms (SNPs) associated with disease traits.Lastly, ChIP-seq data of transcription associated factors and topological domains were included as binding evidence and potential gene target inference.The calculated regulatory potential, allelic binding affinity scores and ChIP-seq binding evidence were the three criteria used to prioritize candidates.

View Article: PubMed Central - HTML - PubMed

Affiliation: Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, University of British Columbia, Vancouver, British Columbia, Canada. wyeth@cmmt.ubc.ca.

ABSTRACT

Background: Genome wide association studies (GWAS) are a population-scale approach to the identification of segments of the genome in which genetic variations may contribute to disease risk. Current methods focus on the discovery of single nucleotide polymorphisms (SNPs) associated with disease traits. As there are many SNPs within identified risk loci, and the majority of these are situated within non-coding regions, a key challenge is to identify and prioritize variants affecting regulatory sequences that are likely to contribute to the phenotype assessed.

Methods: We focused investigation on SNPs within lung and breast cancer GWAS loci that reached genome-wide significance for potential roles in gene regulation with a specific focus on SNPs likely to disrupt transcription factor binding sites. Within risk loci, the regulatory potential of sub-regions was classified using relevant open chromatin and epigenetic high throughput sequencing data sets from the ENCODE project in available cancer and normal cell lines. Furthermore, transcription factor affinity altering variants were predicted by comparison of position weight matrix scores between disease and reference alleles. Lastly, ChIP-seq data of transcription associated factors and topological domains were included as binding evidence and potential gene target inference.

Results: The sets of SNPs, including both the disease-associated markers and those in high linkage disequilibrium with them, were significantly over-represented in regulatory sequences of cancer and/or normal cells; however, over-representation was generally not restricted to disease-relevant tissue specific regions. The calculated regulatory potential, allelic binding affinity scores and ChIP-seq binding evidence were the three criteria used to prioritize candidates. Fitting all three criteria, we highlighted breast cancer susceptibility SNPs and a borderline lung cancer relevant SNP located in cancer-specific enhancers overlapping multiple distinct transcription associated factor ChIP-seq binding sites.

Conclusion: Incorporating high throughput sequencing epigenetic and transcription factor data sets from both cancer and normal cells into cancer genetic studies reveals potential functional SNPs and informs subsequent characterization efforts.

Show MeSH

Related in: MedlinePlus

Two-dimensional heatmap of chromatin interaction in the neighbourhood of the rs12087869 SNP. The figure shows Hi-C chromatin interaction datasets in H1 human ES cells (upper) and IMR90 fibroblast cells (lower panel) obtained from Dixon et al.[31] in the neighbourhood of the rs12087869 SNP. The topological domains (TADs) from both cell types were shown to indicate genomic neighbourhood of stronger within-domain interactions. The heatmap values indicated in a color scale correspond to the number of times that reads in two 20 kb bins were sequenced as a pair, with the red color indicating stronger interaction and white being little or no interaction. The 85 percentile read counts (29 for H1 and 21 for IMR90 cells) were used as the upper limit for the heatmap to avoid color domination of extremely interactive regions. This plot was generated using ‘HiTC’ R package, and the dotted lines were drawn to aid in visualizing the interactive domain in which the SNP is located. The TAD region (from H1 cells) containing the SNP is highlighted in a light pink box.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4066296&req=5

Figure 7: Two-dimensional heatmap of chromatin interaction in the neighbourhood of the rs12087869 SNP. The figure shows Hi-C chromatin interaction datasets in H1 human ES cells (upper) and IMR90 fibroblast cells (lower panel) obtained from Dixon et al.[31] in the neighbourhood of the rs12087869 SNP. The topological domains (TADs) from both cell types were shown to indicate genomic neighbourhood of stronger within-domain interactions. The heatmap values indicated in a color scale correspond to the number of times that reads in two 20 kb bins were sequenced as a pair, with the red color indicating stronger interaction and white being little or no interaction. The 85 percentile read counts (29 for H1 and 21 for IMR90 cells) were used as the upper limit for the heatmap to avoid color domination of extremely interactive regions. This plot was generated using ‘HiTC’ R package, and the dotted lines were drawn to aid in visualizing the interactive domain in which the SNP is located. The TAD region (from H1 cells) containing the SNP is highlighted in a light pink box.

Mentions: In order to infer potential gene targets of the enhancer containing the SNP, we used Hi-C chromatin interaction datasets in cells where datasets were available, H1 and IMR90 cells [31]. Enhancers are known that target multiple TSSs, and a recent large-scale enhancer study across human cell types has shown that 40% of inferred TSS-associated enhancers (computed from pairwise correlation of FANTOM5 CAGE data) target at least the nearest TSSs [39]. Such enhancer-TSS interactions vary across cell types, and can be revealed through chromosome conformation capture techniques. Recent studies have shown that the boundaries of highly interactive genomic neighbourhoods (topological associating domains; TADs) were highly consistent across cell types [31,40], whereas interactions between sub-TADs were cell type-specific [41]. Through examining the topological domains and Hi-C chromatin interaction data generated by Dixon et al., genes that can potentially be affected by an increase in TF binding affinity of the rs12087869 risk allele include PGM1, ROR1, Mir-544, BC040909, AK096291 and UBE2U (Figure 7). Potential targets of the breast cancer susceptibility SNPs that we highlighted in the previous section (rs1292011, rs1391720 and rs1391721) include multiple lincRNAs, Metazoa_SRP and TBX3 (Additional file 14).


On the identification of potential regulatory variants within genome wide association candidate SNP sets.

Chen CY, Chang IS, Hsiung CA, Wasserman WW - BMC Med Genomics (2014)

Two-dimensional heatmap of chromatin interaction in the neighbourhood of the rs12087869 SNP. The figure shows Hi-C chromatin interaction datasets in H1 human ES cells (upper) and IMR90 fibroblast cells (lower panel) obtained from Dixon et al.[31] in the neighbourhood of the rs12087869 SNP. The topological domains (TADs) from both cell types were shown to indicate genomic neighbourhood of stronger within-domain interactions. The heatmap values indicated in a color scale correspond to the number of times that reads in two 20 kb bins were sequenced as a pair, with the red color indicating stronger interaction and white being little or no interaction. The 85 percentile read counts (29 for H1 and 21 for IMR90 cells) were used as the upper limit for the heatmap to avoid color domination of extremely interactive regions. This plot was generated using ‘HiTC’ R package, and the dotted lines were drawn to aid in visualizing the interactive domain in which the SNP is located. The TAD region (from H1 cells) containing the SNP is highlighted in a light pink box.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4066296&req=5

Figure 7: Two-dimensional heatmap of chromatin interaction in the neighbourhood of the rs12087869 SNP. The figure shows Hi-C chromatin interaction datasets in H1 human ES cells (upper) and IMR90 fibroblast cells (lower panel) obtained from Dixon et al.[31] in the neighbourhood of the rs12087869 SNP. The topological domains (TADs) from both cell types were shown to indicate genomic neighbourhood of stronger within-domain interactions. The heatmap values indicated in a color scale correspond to the number of times that reads in two 20 kb bins were sequenced as a pair, with the red color indicating stronger interaction and white being little or no interaction. The 85 percentile read counts (29 for H1 and 21 for IMR90 cells) were used as the upper limit for the heatmap to avoid color domination of extremely interactive regions. This plot was generated using ‘HiTC’ R package, and the dotted lines were drawn to aid in visualizing the interactive domain in which the SNP is located. The TAD region (from H1 cells) containing the SNP is highlighted in a light pink box.
Mentions: In order to infer potential gene targets of the enhancer containing the SNP, we used Hi-C chromatin interaction datasets in cells where datasets were available, H1 and IMR90 cells [31]. Enhancers are known that target multiple TSSs, and a recent large-scale enhancer study across human cell types has shown that 40% of inferred TSS-associated enhancers (computed from pairwise correlation of FANTOM5 CAGE data) target at least the nearest TSSs [39]. Such enhancer-TSS interactions vary across cell types, and can be revealed through chromosome conformation capture techniques. Recent studies have shown that the boundaries of highly interactive genomic neighbourhoods (topological associating domains; TADs) were highly consistent across cell types [31,40], whereas interactions between sub-TADs were cell type-specific [41]. Through examining the topological domains and Hi-C chromatin interaction data generated by Dixon et al., genes that can potentially be affected by an increase in TF binding affinity of the rs12087869 risk allele include PGM1, ROR1, Mir-544, BC040909, AK096291 and UBE2U (Figure 7). Potential targets of the breast cancer susceptibility SNPs that we highlighted in the previous section (rs1292011, rs1391720 and rs1391721) include multiple lincRNAs, Metazoa_SRP and TBX3 (Additional file 14).

Bottom Line: Current methods focus on the discovery of single nucleotide polymorphisms (SNPs) associated with disease traits.Lastly, ChIP-seq data of transcription associated factors and topological domains were included as binding evidence and potential gene target inference.The calculated regulatory potential, allelic binding affinity scores and ChIP-seq binding evidence were the three criteria used to prioritize candidates.

View Article: PubMed Central - HTML - PubMed

Affiliation: Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, University of British Columbia, Vancouver, British Columbia, Canada. wyeth@cmmt.ubc.ca.

ABSTRACT

Background: Genome wide association studies (GWAS) are a population-scale approach to the identification of segments of the genome in which genetic variations may contribute to disease risk. Current methods focus on the discovery of single nucleotide polymorphisms (SNPs) associated with disease traits. As there are many SNPs within identified risk loci, and the majority of these are situated within non-coding regions, a key challenge is to identify and prioritize variants affecting regulatory sequences that are likely to contribute to the phenotype assessed.

Methods: We focused investigation on SNPs within lung and breast cancer GWAS loci that reached genome-wide significance for potential roles in gene regulation with a specific focus on SNPs likely to disrupt transcription factor binding sites. Within risk loci, the regulatory potential of sub-regions was classified using relevant open chromatin and epigenetic high throughput sequencing data sets from the ENCODE project in available cancer and normal cell lines. Furthermore, transcription factor affinity altering variants were predicted by comparison of position weight matrix scores between disease and reference alleles. Lastly, ChIP-seq data of transcription associated factors and topological domains were included as binding evidence and potential gene target inference.

Results: The sets of SNPs, including both the disease-associated markers and those in high linkage disequilibrium with them, were significantly over-represented in regulatory sequences of cancer and/or normal cells; however, over-representation was generally not restricted to disease-relevant tissue specific regions. The calculated regulatory potential, allelic binding affinity scores and ChIP-seq binding evidence were the three criteria used to prioritize candidates. Fitting all three criteria, we highlighted breast cancer susceptibility SNPs and a borderline lung cancer relevant SNP located in cancer-specific enhancers overlapping multiple distinct transcription associated factor ChIP-seq binding sites.

Conclusion: Incorporating high throughput sequencing epigenetic and transcription factor data sets from both cancer and normal cells into cancer genetic studies reveals potential functional SNPs and informs subsequent characterization efforts.

Show MeSH
Related in: MedlinePlus