Limits...
Allele-specific transcription factor binding to common and rare variants associated with disease and gene expression.

Cavalli M, Pan G, Nord H, Wallerman O, Wallén Arzt E, Berggren O, Elvers I, Eloranta ML, Rönnblom L, Lindblad Toh K, Wadelius C - Hum. Genet. (2016)

Bottom Line: We found 9962 candidate regulatory SNPs, of which 16 % were rare and showed evidence of larger functional effect than common ones.Functionally rare variants may explain divergent GWAS results between populations and are candidates for a partial explanation of the missing heritability.Furthermore, by examining GWAS loci we found >400 allele-specific candidate SNPs, 141 of which were highly relevant in our cell types.

View Article: PubMed Central - PubMed

Affiliation: Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden.

ABSTRACT
Genome-wide association studies (GWAS) have identified a large number of disease-associated SNPs, but in few cases the functional variant and the gene it controls have been identified. To systematically identify candidate regulatory variants, we sequenced ENCODE cell lines and used public ChIP-seq data to look for transcription factors binding preferentially to one allele. We found 9962 candidate regulatory SNPs, of which 16 % were rare and showed evidence of larger functional effect than common ones. Functionally rare variants may explain divergent GWAS results between populations and are candidates for a partial explanation of the missing heritability. The majority of allele-specific variants (96 %) were specific to a cell type. Furthermore, by examining GWAS loci we found >400 allele-specific candidate SNPs, 141 of which were highly relevant in our cell types. Functionally validated SNPs support identification of an SNP in SYNGR1 which may expose to the risk of rheumatoid arthritis and primary biliary cirrhosis, as well as an SNP in the last intron of COG6 exposing to the risk of psoriasis. We propose that by repeating the ChIP-seq experiments of 20 selected transcription factors in three to ten people, the most common polymorphisms can be interrogated for allele-specific binding. Our strategy may help to remove the current bottleneck in functional annotation of the genome.

No MeSH data available.


Related in: MedlinePlus

Coverage of transcription factors and alleles in the population. a Network representing the top 20 TFs, polymerases or coactivators whose ChIP-seq reads detect most AS-SNPs in four different cell lines. The TFs detecting most AS-SNPs in several cell lines are clustered at the center with the more cell-specific ones in the outer layers. b The likelihood of finding a heterozygous SNP as a function of the allele frequency considering one or more individuals. The AUC represents the proportion of heterozygous SNPs a population of n individuals
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4835527&req=5

Fig2: Coverage of transcription factors and alleles in the population. a Network representing the top 20 TFs, polymerases or coactivators whose ChIP-seq reads detect most AS-SNPs in four different cell lines. The TFs detecting most AS-SNPs in several cell lines are clustered at the center with the more cell-specific ones in the outer layers. b The likelihood of finding a heterozygous SNP as a function of the allele frequency considering one or more individuals. The AUC represents the proportion of heterozygous SNPs a population of n individuals

Mentions: We investigated the overlap of the top 20 TFs, polymerases or coactivators between the four different cell lines (Fig. 2a; Table S4 in Supplementary material 1). Three TFs (CTCF, RAD21, YY1) and POL2 detected many AS-SNPs in all cell lines, three lines shared 4 TFs (NRSF, MAX, USF1 and TBP) and two lines shared 11 TFs (CEBPB, ZNF143, GABP, JUND, RFX5, EGR1, ELF1, NFYB, CREB1, TEAD4 and E2F6) and P300. The number of highly informative TFs that are unique to one cell line varied between five (H1-hESC) and nine (SK-N-SH). Some of these unique TFs are previously known to be of central importance for the cell, i.e., pioneer TFs like NANOG in H1-hESC and factors important for maturation of B cells such as BCL3, EBF1 and PU1 in GM12878. Our data suggest that TFs shared by many cells, pioneer TFs and those important for cell development should be selected when designing a project to identify AS-SNPs in a cell or tissue previously not studied by ChIP-seq.Fig. 2


Allele-specific transcription factor binding to common and rare variants associated with disease and gene expression.

Cavalli M, Pan G, Nord H, Wallerman O, Wallén Arzt E, Berggren O, Elvers I, Eloranta ML, Rönnblom L, Lindblad Toh K, Wadelius C - Hum. Genet. (2016)

Coverage of transcription factors and alleles in the population. a Network representing the top 20 TFs, polymerases or coactivators whose ChIP-seq reads detect most AS-SNPs in four different cell lines. The TFs detecting most AS-SNPs in several cell lines are clustered at the center with the more cell-specific ones in the outer layers. b The likelihood of finding a heterozygous SNP as a function of the allele frequency considering one or more individuals. The AUC represents the proportion of heterozygous SNPs a population of n individuals
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4835527&req=5

Fig2: Coverage of transcription factors and alleles in the population. a Network representing the top 20 TFs, polymerases or coactivators whose ChIP-seq reads detect most AS-SNPs in four different cell lines. The TFs detecting most AS-SNPs in several cell lines are clustered at the center with the more cell-specific ones in the outer layers. b The likelihood of finding a heterozygous SNP as a function of the allele frequency considering one or more individuals. The AUC represents the proportion of heterozygous SNPs a population of n individuals
Mentions: We investigated the overlap of the top 20 TFs, polymerases or coactivators between the four different cell lines (Fig. 2a; Table S4 in Supplementary material 1). Three TFs (CTCF, RAD21, YY1) and POL2 detected many AS-SNPs in all cell lines, three lines shared 4 TFs (NRSF, MAX, USF1 and TBP) and two lines shared 11 TFs (CEBPB, ZNF143, GABP, JUND, RFX5, EGR1, ELF1, NFYB, CREB1, TEAD4 and E2F6) and P300. The number of highly informative TFs that are unique to one cell line varied between five (H1-hESC) and nine (SK-N-SH). Some of these unique TFs are previously known to be of central importance for the cell, i.e., pioneer TFs like NANOG in H1-hESC and factors important for maturation of B cells such as BCL3, EBF1 and PU1 in GM12878. Our data suggest that TFs shared by many cells, pioneer TFs and those important for cell development should be selected when designing a project to identify AS-SNPs in a cell or tissue previously not studied by ChIP-seq.Fig. 2

Bottom Line: We found 9962 candidate regulatory SNPs, of which 16 % were rare and showed evidence of larger functional effect than common ones.Functionally rare variants may explain divergent GWAS results between populations and are candidates for a partial explanation of the missing heritability.Furthermore, by examining GWAS loci we found >400 allele-specific candidate SNPs, 141 of which were highly relevant in our cell types.

View Article: PubMed Central - PubMed

Affiliation: Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden.

ABSTRACT
Genome-wide association studies (GWAS) have identified a large number of disease-associated SNPs, but in few cases the functional variant and the gene it controls have been identified. To systematically identify candidate regulatory variants, we sequenced ENCODE cell lines and used public ChIP-seq data to look for transcription factors binding preferentially to one allele. We found 9962 candidate regulatory SNPs, of which 16 % were rare and showed evidence of larger functional effect than common ones. Functionally rare variants may explain divergent GWAS results between populations and are candidates for a partial explanation of the missing heritability. The majority of allele-specific variants (96 %) were specific to a cell type. Furthermore, by examining GWAS loci we found >400 allele-specific candidate SNPs, 141 of which were highly relevant in our cell types. Functionally validated SNPs support identification of an SNP in SYNGR1 which may expose to the risk of rheumatoid arthritis and primary biliary cirrhosis, as well as an SNP in the last intron of COG6 exposing to the risk of psoriasis. We propose that by repeating the ChIP-seq experiments of 20 selected transcription factors in three to ten people, the most common polymorphisms can be interrogated for allele-specific binding. Our strategy may help to remove the current bottleneck in functional annotation of the genome.

No MeSH data available.


Related in: MedlinePlus