Limits...
On the identification of potential regulatory variants within genome wide association candidate SNP sets.

Chen CY, Chang IS, Hsiung CA, Wasserman WW - BMC Med Genomics (2014)

Bottom Line: Current methods focus on the discovery of single nucleotide polymorphisms (SNPs) associated with disease traits.Lastly, ChIP-seq data of transcription associated factors and topological domains were included as binding evidence and potential gene target inference.The calculated regulatory potential, allelic binding affinity scores and ChIP-seq binding evidence were the three criteria used to prioritize candidates.

View Article: PubMed Central - HTML - PubMed

Affiliation: Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, University of British Columbia, Vancouver, British Columbia, Canada. wyeth@cmmt.ubc.ca.

ABSTRACT

Background: Genome wide association studies (GWAS) are a population-scale approach to the identification of segments of the genome in which genetic variations may contribute to disease risk. Current methods focus on the discovery of single nucleotide polymorphisms (SNPs) associated with disease traits. As there are many SNPs within identified risk loci, and the majority of these are situated within non-coding regions, a key challenge is to identify and prioritize variants affecting regulatory sequences that are likely to contribute to the phenotype assessed.

Methods: We focused investigation on SNPs within lung and breast cancer GWAS loci that reached genome-wide significance for potential roles in gene regulation with a specific focus on SNPs likely to disrupt transcription factor binding sites. Within risk loci, the regulatory potential of sub-regions was classified using relevant open chromatin and epigenetic high throughput sequencing data sets from the ENCODE project in available cancer and normal cell lines. Furthermore, transcription factor affinity altering variants were predicted by comparison of position weight matrix scores between disease and reference alleles. Lastly, ChIP-seq data of transcription associated factors and topological domains were included as binding evidence and potential gene target inference.

Results: The sets of SNPs, including both the disease-associated markers and those in high linkage disequilibrium with them, were significantly over-represented in regulatory sequences of cancer and/or normal cells; however, over-representation was generally not restricted to disease-relevant tissue specific regions. The calculated regulatory potential, allelic binding affinity scores and ChIP-seq binding evidence were the three criteria used to prioritize candidates. Fitting all three criteria, we highlighted breast cancer susceptibility SNPs and a borderline lung cancer relevant SNP located in cancer-specific enhancers overlapping multiple distinct transcription associated factor ChIP-seq binding sites.

Conclusion: Incorporating high throughput sequencing epigenetic and transcription factor data sets from both cancer and normal cells into cancer genetic studies reveals potential functional SNPs and informs subsequent characterization efforts.

Show MeSH

Related in: MedlinePlus

Differences in regulatory potential and allelic TF binding affinity for Lung.cancer and Breast.cancer LD80 SNPs. The plots present potentially affected TFBS, with the upper panel (A & C) displaying SNPs that confer stronger TFBS patterns in cancer patients with the minor allele while the lower panel (B & D) displayed an decrease in TF binding affinity. The x-axis represents the relative regulatory potential, defined as log2 ratio of regulatory potential index between cancer and normal cells plus 1. The relative regulatory potential is indicated as positive for higher regulatory potential in cancer cells (A549 for A and B; MCF-7 for C and D) and negative for higher regulatory potential in the corresponding normal cells (NHLF normal lung fibroblasts for A and B; HMEC breast normal cells for C and D). The y-axis shows the -1xlog2 transformation of empirical p-values for motif affinity score changes. The data shown on the plot are restricted to PWMs with p-values<0.05 from the two-tailed test, and for visualization purposes, only PWMs with scores > 85 in at least one allele are shown. TFs with an increase or decrease of TF binding affinity where the SNP has non-zero regulatory potential in either cancer or normal cells are labeled along with the corresponding SNP. SNPs with zero regulatory potential index in both cells are represented by gray dots, whereas those with regulatory potential indices >0 in both cells are colored in blue. SNPs with regulatory potential index restricted to a single cell type (cancer or normal cells) are colored in red and green, respectively. In plot C, a red arrow indicates a SNP rs1391720 that is discussed in the text. The vertical bar illustrates the degree of difference in TF affinity.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4066296&req=5

Figure 4: Differences in regulatory potential and allelic TF binding affinity for Lung.cancer and Breast.cancer LD80 SNPs. The plots present potentially affected TFBS, with the upper panel (A & C) displaying SNPs that confer stronger TFBS patterns in cancer patients with the minor allele while the lower panel (B & D) displayed an decrease in TF binding affinity. The x-axis represents the relative regulatory potential, defined as log2 ratio of regulatory potential index between cancer and normal cells plus 1. The relative regulatory potential is indicated as positive for higher regulatory potential in cancer cells (A549 for A and B; MCF-7 for C and D) and negative for higher regulatory potential in the corresponding normal cells (NHLF normal lung fibroblasts for A and B; HMEC breast normal cells for C and D). The y-axis shows the -1xlog2 transformation of empirical p-values for motif affinity score changes. The data shown on the plot are restricted to PWMs with p-values<0.05 from the two-tailed test, and for visualization purposes, only PWMs with scores > 85 in at least one allele are shown. TFs with an increase or decrease of TF binding affinity where the SNP has non-zero regulatory potential in either cancer or normal cells are labeled along with the corresponding SNP. SNPs with zero regulatory potential index in both cells are represented by gray dots, whereas those with regulatory potential indices >0 in both cells are colored in blue. SNPs with regulatory potential index restricted to a single cell type (cancer or normal cells) are colored in red and green, respectively. In plot C, a red arrow indicates a SNP rs1391720 that is discussed in the text. The vertical bar illustrates the degree of difference in TF affinity.

Mentions: To prioritize functional SNPs, we examined the relative regulatory potential by comparing cancer to normal cell lines where data is available (detailed in Methods) and differences in predicted TF binding affinity by comparing major and minor alleles. We did not restrict the TF affinity analysis to SNPs present in the cancer cell lines we worked with, and we assumed the regulatory potential in the cancer cell lines shows whether regions are active and accessible in the corresponding cancer cells. Figure 4 showed the functional prioritization plot for Lung.cancer and Breast.cancer due to higher data availability, and plots for other LD80 sets were provided in Additional file 7. In Figure 4, the SNP-impacted TFBS in quadrants I were consistent with the presence of a stronger TFBS in regulatory regions preferentially observed in the cancer samples, quadrant II with the presence of a stronger TFBS in regulatory regions preferentially observed in normal cells, quadrant III with the presence of a weaker TFBS in regions preferentially observed in normal cells, and quadrant IV with presence of a weaker TFBS in regions preferentially observed in cancer cells (i.e. loss of a silencing TFBS). The magnitude of relative regulatory potential observed in the Breast.cancer set was higher than that of the Lung.cancer set. We found that an increase of TF binding affinity in the minor allele was not necessarily associated with a gain of regulatory potential in the cancer cell line, and vice versa.


On the identification of potential regulatory variants within genome wide association candidate SNP sets.

Chen CY, Chang IS, Hsiung CA, Wasserman WW - BMC Med Genomics (2014)

Differences in regulatory potential and allelic TF binding affinity for Lung.cancer and Breast.cancer LD80 SNPs. The plots present potentially affected TFBS, with the upper panel (A & C) displaying SNPs that confer stronger TFBS patterns in cancer patients with the minor allele while the lower panel (B & D) displayed an decrease in TF binding affinity. The x-axis represents the relative regulatory potential, defined as log2 ratio of regulatory potential index between cancer and normal cells plus 1. The relative regulatory potential is indicated as positive for higher regulatory potential in cancer cells (A549 for A and B; MCF-7 for C and D) and negative for higher regulatory potential in the corresponding normal cells (NHLF normal lung fibroblasts for A and B; HMEC breast normal cells for C and D). The y-axis shows the -1xlog2 transformation of empirical p-values for motif affinity score changes. The data shown on the plot are restricted to PWMs with p-values<0.05 from the two-tailed test, and for visualization purposes, only PWMs with scores > 85 in at least one allele are shown. TFs with an increase or decrease of TF binding affinity where the SNP has non-zero regulatory potential in either cancer or normal cells are labeled along with the corresponding SNP. SNPs with zero regulatory potential index in both cells are represented by gray dots, whereas those with regulatory potential indices >0 in both cells are colored in blue. SNPs with regulatory potential index restricted to a single cell type (cancer or normal cells) are colored in red and green, respectively. In plot C, a red arrow indicates a SNP rs1391720 that is discussed in the text. The vertical bar illustrates the degree of difference in TF affinity.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4066296&req=5

Figure 4: Differences in regulatory potential and allelic TF binding affinity for Lung.cancer and Breast.cancer LD80 SNPs. The plots present potentially affected TFBS, with the upper panel (A & C) displaying SNPs that confer stronger TFBS patterns in cancer patients with the minor allele while the lower panel (B & D) displayed an decrease in TF binding affinity. The x-axis represents the relative regulatory potential, defined as log2 ratio of regulatory potential index between cancer and normal cells plus 1. The relative regulatory potential is indicated as positive for higher regulatory potential in cancer cells (A549 for A and B; MCF-7 for C and D) and negative for higher regulatory potential in the corresponding normal cells (NHLF normal lung fibroblasts for A and B; HMEC breast normal cells for C and D). The y-axis shows the -1xlog2 transformation of empirical p-values for motif affinity score changes. The data shown on the plot are restricted to PWMs with p-values<0.05 from the two-tailed test, and for visualization purposes, only PWMs with scores > 85 in at least one allele are shown. TFs with an increase or decrease of TF binding affinity where the SNP has non-zero regulatory potential in either cancer or normal cells are labeled along with the corresponding SNP. SNPs with zero regulatory potential index in both cells are represented by gray dots, whereas those with regulatory potential indices >0 in both cells are colored in blue. SNPs with regulatory potential index restricted to a single cell type (cancer or normal cells) are colored in red and green, respectively. In plot C, a red arrow indicates a SNP rs1391720 that is discussed in the text. The vertical bar illustrates the degree of difference in TF affinity.
Mentions: To prioritize functional SNPs, we examined the relative regulatory potential by comparing cancer to normal cell lines where data is available (detailed in Methods) and differences in predicted TF binding affinity by comparing major and minor alleles. We did not restrict the TF affinity analysis to SNPs present in the cancer cell lines we worked with, and we assumed the regulatory potential in the cancer cell lines shows whether regions are active and accessible in the corresponding cancer cells. Figure 4 showed the functional prioritization plot for Lung.cancer and Breast.cancer due to higher data availability, and plots for other LD80 sets were provided in Additional file 7. In Figure 4, the SNP-impacted TFBS in quadrants I were consistent with the presence of a stronger TFBS in regulatory regions preferentially observed in the cancer samples, quadrant II with the presence of a stronger TFBS in regulatory regions preferentially observed in normal cells, quadrant III with the presence of a weaker TFBS in regions preferentially observed in normal cells, and quadrant IV with presence of a weaker TFBS in regions preferentially observed in cancer cells (i.e. loss of a silencing TFBS). The magnitude of relative regulatory potential observed in the Breast.cancer set was higher than that of the Lung.cancer set. We found that an increase of TF binding affinity in the minor allele was not necessarily associated with a gain of regulatory potential in the cancer cell line, and vice versa.

Bottom Line: Current methods focus on the discovery of single nucleotide polymorphisms (SNPs) associated with disease traits.Lastly, ChIP-seq data of transcription associated factors and topological domains were included as binding evidence and potential gene target inference.The calculated regulatory potential, allelic binding affinity scores and ChIP-seq binding evidence were the three criteria used to prioritize candidates.

View Article: PubMed Central - HTML - PubMed

Affiliation: Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, University of British Columbia, Vancouver, British Columbia, Canada. wyeth@cmmt.ubc.ca.

ABSTRACT

Background: Genome wide association studies (GWAS) are a population-scale approach to the identification of segments of the genome in which genetic variations may contribute to disease risk. Current methods focus on the discovery of single nucleotide polymorphisms (SNPs) associated with disease traits. As there are many SNPs within identified risk loci, and the majority of these are situated within non-coding regions, a key challenge is to identify and prioritize variants affecting regulatory sequences that are likely to contribute to the phenotype assessed.

Methods: We focused investigation on SNPs within lung and breast cancer GWAS loci that reached genome-wide significance for potential roles in gene regulation with a specific focus on SNPs likely to disrupt transcription factor binding sites. Within risk loci, the regulatory potential of sub-regions was classified using relevant open chromatin and epigenetic high throughput sequencing data sets from the ENCODE project in available cancer and normal cell lines. Furthermore, transcription factor affinity altering variants were predicted by comparison of position weight matrix scores between disease and reference alleles. Lastly, ChIP-seq data of transcription associated factors and topological domains were included as binding evidence and potential gene target inference.

Results: The sets of SNPs, including both the disease-associated markers and those in high linkage disequilibrium with them, were significantly over-represented in regulatory sequences of cancer and/or normal cells; however, over-representation was generally not restricted to disease-relevant tissue specific regions. The calculated regulatory potential, allelic binding affinity scores and ChIP-seq binding evidence were the three criteria used to prioritize candidates. Fitting all three criteria, we highlighted breast cancer susceptibility SNPs and a borderline lung cancer relevant SNP located in cancer-specific enhancers overlapping multiple distinct transcription associated factor ChIP-seq binding sites.

Conclusion: Incorporating high throughput sequencing epigenetic and transcription factor data sets from both cancer and normal cells into cancer genetic studies reveals potential functional SNPs and informs subsequent characterization efforts.

Show MeSH
Related in: MedlinePlus