Limits...
Association mapping across numerous traits reveals patterns of functional variation in maize.

Wallace JG, Bradbury PJ, Zhang N, Gibon Y, Stitt M, Buckler ES - PLoS Genet. (2014)

Bottom Line: Phenotypic variation in natural populations results from a combination of genetic effects, environmental effects, and gene-by-environment interactions.We also find that genes tagged by GWAS are enriched for regulatory functions and are ∼ 50% more likely to have a paralog than expected by chance, indicating that gene regulation and gene duplication are strong drivers of phenotypic variation.These results will likely apply to many other organisms, especially ones with large and complex genomes like maize.

View Article: PubMed Central - PubMed

Affiliation: Institute for Genomic Diversity, Cornell University, Ithaca, New York, United States of America.

ABSTRACT
Phenotypic variation in natural populations results from a combination of genetic effects, environmental effects, and gene-by-environment interactions. Despite the vast amount of genomic data becoming available, many pressing questions remain about the nature of genetic mutations that underlie functional variation. We present the results of combining genome-wide association analysis of 41 different phenotypes in ∼ 5,000 inbred maize lines to analyze patterns of high-resolution genetic association among of 28.9 million single-nucleotide polymorphisms (SNPs) and ∼ 800,000 copy-number variants (CNVs). We show that genic and intergenic regions have opposite patterns of enrichment, minor allele frequencies, and effect sizes, implying tradeoffs among the probability that a given polymorphism will have an effect, the detectable size of that effect, and its frequency in the population. We also find that genes tagged by GWAS are enriched for regulatory functions and are ∼ 50% more likely to have a paralog than expected by chance, indicating that gene regulation and gene duplication are strong drivers of phenotypic variation. These results will likely apply to many other organisms, especially ones with large and complex genomes like maize.

No MeSH data available.


Related in: MedlinePlus

Relative enrichment of polymorphism classes in GWAS hits.(A) The proportions of different polymorphism classes in the input dataset (left) and GWAS hits (right). The overall GWAS hit distribution is significantly different from the input at p = 8.74×10−35 (Chi-square test). (B) The relative change in polymorphism classes in the GWAS dataset as compared to the input dataset, with the raw p-value of each class shown at right (two-sided exact binomial test). Only categories with Bonferroni-corrected p-values ≤0.01 are shown. The strong depletion of intergenic SNPs in the GWAS dataset drives almost all other categories to appear significantly enriched. Exact category counts and alternate p-values based on circular permutation are available in S1 Table. (C) The same analysis as in (B), but with intergenic regions excluded.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4256217&req=5

pgen-1004845-g002: Relative enrichment of polymorphism classes in GWAS hits.(A) The proportions of different polymorphism classes in the input dataset (left) and GWAS hits (right). The overall GWAS hit distribution is significantly different from the input at p = 8.74×10−35 (Chi-square test). (B) The relative change in polymorphism classes in the GWAS dataset as compared to the input dataset, with the raw p-value of each class shown at right (two-sided exact binomial test). Only categories with Bonferroni-corrected p-values ≤0.01 are shown. The strong depletion of intergenic SNPs in the GWAS dataset drives almost all other categories to appear significantly enriched. Exact category counts and alternate p-values based on circular permutation are available in S1 Table. (C) The same analysis as in (B), but with intergenic regions excluded.

Mentions: After classification, we analyzed the distribution of VEP classes and copy-number variants (CNVs) for enrichment in GWAS hits relative to the input dataset (Fig. 2). Intergenic regions (>5 kb away from the nearest gene) are strongly depleted for GWAS hits, causing almost all other categories to show significant enrichment (Fig. 2B). Part of this depletion may be due to transposon activity in intergenic regions altering the physical location—and thus the projected genotype—of sequences in some founder lines. After controlling for intergenic regions, both genic SNPs and CNVs are still strongly enriched for GWAS hits (Fig. 2C). This agrees with the recent findings of Schork et al.[24], who found similar enrichment patterns of GWAS hits close to genes. Of the enriched classes, large CNVs show the most enrichment, while the most enriched SNP category is for synonymous mutations. Some of the enrichment for synonymous sites is probably due to synthetic associations [25], [26], where the signals from several low-frequency causal SNPs combine to make a nearby, higher-frequency SNP appear associated with the trait. (This is different from the normal situation in GWAS where the associated SNPs are assumed to be linked to causal loci that weren't sampled but that would show up if they had been.) Such associations are probably not the sole explanation for the enrichment of synonymous SNPs, however, because synonymous SNPs are also significantly enriched over intronic SNPs (p = 2.80×10−8 by Chi-square test) despite having similar site frequency spectra (S3 Figure) and being in similar LD structures (due to the small size of maize introns, which have a median size of only ∼150 base pairs in quality-filtered genes). This implies a legitimate enrichment for synonymous SNPs. Some (and possibly most) of that enrichment is probably due to linkage with nearby causal SNPs; this may also result in the enrichment of synonymous over intronic SNPs, since synonymous ones will on average still be in tighter LD with nonsynonymous SNPs than will those in introns. The remainder of the enrichment is likely due to the (unknown) fraction that are causal themselves but act through mechanisms other than protein sequence (e.g., altering mRNA stability, protein binding sites, or local translation rates [27]).


Association mapping across numerous traits reveals patterns of functional variation in maize.

Wallace JG, Bradbury PJ, Zhang N, Gibon Y, Stitt M, Buckler ES - PLoS Genet. (2014)

Relative enrichment of polymorphism classes in GWAS hits.(A) The proportions of different polymorphism classes in the input dataset (left) and GWAS hits (right). The overall GWAS hit distribution is significantly different from the input at p = 8.74×10−35 (Chi-square test). (B) The relative change in polymorphism classes in the GWAS dataset as compared to the input dataset, with the raw p-value of each class shown at right (two-sided exact binomial test). Only categories with Bonferroni-corrected p-values ≤0.01 are shown. The strong depletion of intergenic SNPs in the GWAS dataset drives almost all other categories to appear significantly enriched. Exact category counts and alternate p-values based on circular permutation are available in S1 Table. (C) The same analysis as in (B), but with intergenic regions excluded.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4256217&req=5

pgen-1004845-g002: Relative enrichment of polymorphism classes in GWAS hits.(A) The proportions of different polymorphism classes in the input dataset (left) and GWAS hits (right). The overall GWAS hit distribution is significantly different from the input at p = 8.74×10−35 (Chi-square test). (B) The relative change in polymorphism classes in the GWAS dataset as compared to the input dataset, with the raw p-value of each class shown at right (two-sided exact binomial test). Only categories with Bonferroni-corrected p-values ≤0.01 are shown. The strong depletion of intergenic SNPs in the GWAS dataset drives almost all other categories to appear significantly enriched. Exact category counts and alternate p-values based on circular permutation are available in S1 Table. (C) The same analysis as in (B), but with intergenic regions excluded.
Mentions: After classification, we analyzed the distribution of VEP classes and copy-number variants (CNVs) for enrichment in GWAS hits relative to the input dataset (Fig. 2). Intergenic regions (>5 kb away from the nearest gene) are strongly depleted for GWAS hits, causing almost all other categories to show significant enrichment (Fig. 2B). Part of this depletion may be due to transposon activity in intergenic regions altering the physical location—and thus the projected genotype—of sequences in some founder lines. After controlling for intergenic regions, both genic SNPs and CNVs are still strongly enriched for GWAS hits (Fig. 2C). This agrees with the recent findings of Schork et al.[24], who found similar enrichment patterns of GWAS hits close to genes. Of the enriched classes, large CNVs show the most enrichment, while the most enriched SNP category is for synonymous mutations. Some of the enrichment for synonymous sites is probably due to synthetic associations [25], [26], where the signals from several low-frequency causal SNPs combine to make a nearby, higher-frequency SNP appear associated with the trait. (This is different from the normal situation in GWAS where the associated SNPs are assumed to be linked to causal loci that weren't sampled but that would show up if they had been.) Such associations are probably not the sole explanation for the enrichment of synonymous SNPs, however, because synonymous SNPs are also significantly enriched over intronic SNPs (p = 2.80×10−8 by Chi-square test) despite having similar site frequency spectra (S3 Figure) and being in similar LD structures (due to the small size of maize introns, which have a median size of only ∼150 base pairs in quality-filtered genes). This implies a legitimate enrichment for synonymous SNPs. Some (and possibly most) of that enrichment is probably due to linkage with nearby causal SNPs; this may also result in the enrichment of synonymous over intronic SNPs, since synonymous ones will on average still be in tighter LD with nonsynonymous SNPs than will those in introns. The remainder of the enrichment is likely due to the (unknown) fraction that are causal themselves but act through mechanisms other than protein sequence (e.g., altering mRNA stability, protein binding sites, or local translation rates [27]).

Bottom Line: Phenotypic variation in natural populations results from a combination of genetic effects, environmental effects, and gene-by-environment interactions.We also find that genes tagged by GWAS are enriched for regulatory functions and are ∼ 50% more likely to have a paralog than expected by chance, indicating that gene regulation and gene duplication are strong drivers of phenotypic variation.These results will likely apply to many other organisms, especially ones with large and complex genomes like maize.

View Article: PubMed Central - PubMed

Affiliation: Institute for Genomic Diversity, Cornell University, Ithaca, New York, United States of America.

ABSTRACT
Phenotypic variation in natural populations results from a combination of genetic effects, environmental effects, and gene-by-environment interactions. Despite the vast amount of genomic data becoming available, many pressing questions remain about the nature of genetic mutations that underlie functional variation. We present the results of combining genome-wide association analysis of 41 different phenotypes in ∼ 5,000 inbred maize lines to analyze patterns of high-resolution genetic association among of 28.9 million single-nucleotide polymorphisms (SNPs) and ∼ 800,000 copy-number variants (CNVs). We show that genic and intergenic regions have opposite patterns of enrichment, minor allele frequencies, and effect sizes, implying tradeoffs among the probability that a given polymorphism will have an effect, the detectable size of that effect, and its frequency in the population. We also find that genes tagged by GWAS are enriched for regulatory functions and are ∼ 50% more likely to have a paralog than expected by chance, indicating that gene regulation and gene duplication are strong drivers of phenotypic variation. These results will likely apply to many other organisms, especially ones with large and complex genomes like maize.

No MeSH data available.


Related in: MedlinePlus