Limits...
Association mapping across numerous traits reveals patterns of functional variation in maize.

Wallace JG, Bradbury PJ, Zhang N, Gibon Y, Stitt M, Buckler ES - PLoS Genet. (2014)

Bottom Line: Phenotypic variation in natural populations results from a combination of genetic effects, environmental effects, and gene-by-environment interactions.We also find that genes tagged by GWAS are enriched for regulatory functions and are ∼ 50% more likely to have a paralog than expected by chance, indicating that gene regulation and gene duplication are strong drivers of phenotypic variation.These results will likely apply to many other organisms, especially ones with large and complex genomes like maize.

View Article: PubMed Central - PubMed

Affiliation: Institute for Genomic Diversity, Cornell University, Ithaca, New York, United States of America.

ABSTRACT
Phenotypic variation in natural populations results from a combination of genetic effects, environmental effects, and gene-by-environment interactions. Despite the vast amount of genomic data becoming available, many pressing questions remain about the nature of genetic mutations that underlie functional variation. We present the results of combining genome-wide association analysis of 41 different phenotypes in ∼ 5,000 inbred maize lines to analyze patterns of high-resolution genetic association among of 28.9 million single-nucleotide polymorphisms (SNPs) and ∼ 800,000 copy-number variants (CNVs). We show that genic and intergenic regions have opposite patterns of enrichment, minor allele frequencies, and effect sizes, implying tradeoffs among the probability that a given polymorphism will have an effect, the detectable size of that effect, and its frequency in the population. We also find that genes tagged by GWAS are enriched for regulatory functions and are ∼ 50% more likely to have a paralog than expected by chance, indicating that gene regulation and gene duplication are strong drivers of phenotypic variation. These results will likely apply to many other organisms, especially ones with large and complex genomes like maize.

No MeSH data available.


Related in: MedlinePlus

Distribution of non-genic GWAS hits as a function of gene distance.The number of SNPs at increasing distances from the nearest gene is plotted; CNVs are excluded due to their large size and the difficulty determining where many (especially insertions) actually occur. The input (whole genome) dataset shows a single peak at ∼25 kb away from a gene. The GWAS dataset, however, shows an additional peak at ∼1–5 kb (shaded), where one would expect to find promoters and short-range regulatory elements. Note that due to the log scale, each bin contains successively more nucleotides that make it appear that most SNPs are far from genes, when the reverse is actually true.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4256217&req=5

pgen-1004845-g003: Distribution of non-genic GWAS hits as a function of gene distance.The number of SNPs at increasing distances from the nearest gene is plotted; CNVs are excluded due to their large size and the difficulty determining where many (especially insertions) actually occur. The input (whole genome) dataset shows a single peak at ∼25 kb away from a gene. The GWAS dataset, however, shows an additional peak at ∼1–5 kb (shaded), where one would expect to find promoters and short-range regulatory elements. Note that due to the log scale, each bin contains successively more nucleotides that make it appear that most SNPs are far from genes, when the reverse is actually true.

Mentions: Although genic regions are the most strongly enriched in GWAS, the majority (∼70%) of our hits still fall outside of annotated genes, as defined by their transcriptional start and stop sites. Plotting the distances from non-genic SNPs to the nearest gene on a log scale reveals a bimodal distribution, with a peak at ∼1–5 kb away from genes that is not reflected in the input dataset (Fig. 3). This corresponds with likely positions of promoters and other short-range regulatory elements. Finding enrichment at this scale provides evidence for the high resolution and biological relevance of the GWAS hits in this study. The second peak, which follows the distribution, probably reflects elements that are not correlated with gene distance (e.g., long-range regulatory elements, unannotated transcripts, etc.). For example, using a list of 316 maize noncoding RNAs from Gramene (available at http://ftp.gramene.org/release39/data/fasta/zea_mays/ncrna/) that were not included in the Ensembl annotations reveals that intergenic hits are significantly enriched for polymorphisms within 5 kb of these RNAs (n = 13, expected = 1.07, p = 1.3×10−10 by two-sided exact binomial test). Alternatively, some of these “intergenic” hits may actually be tagging legitimate genes that are simply not present in the reference genome due to the high amount of presence-absence variation in maize [21]. Identifying the nature of these hits should be possible as more information about the maize pan-genome becomes available.


Association mapping across numerous traits reveals patterns of functional variation in maize.

Wallace JG, Bradbury PJ, Zhang N, Gibon Y, Stitt M, Buckler ES - PLoS Genet. (2014)

Distribution of non-genic GWAS hits as a function of gene distance.The number of SNPs at increasing distances from the nearest gene is plotted; CNVs are excluded due to their large size and the difficulty determining where many (especially insertions) actually occur. The input (whole genome) dataset shows a single peak at ∼25 kb away from a gene. The GWAS dataset, however, shows an additional peak at ∼1–5 kb (shaded), where one would expect to find promoters and short-range regulatory elements. Note that due to the log scale, each bin contains successively more nucleotides that make it appear that most SNPs are far from genes, when the reverse is actually true.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4256217&req=5

pgen-1004845-g003: Distribution of non-genic GWAS hits as a function of gene distance.The number of SNPs at increasing distances from the nearest gene is plotted; CNVs are excluded due to their large size and the difficulty determining where many (especially insertions) actually occur. The input (whole genome) dataset shows a single peak at ∼25 kb away from a gene. The GWAS dataset, however, shows an additional peak at ∼1–5 kb (shaded), where one would expect to find promoters and short-range regulatory elements. Note that due to the log scale, each bin contains successively more nucleotides that make it appear that most SNPs are far from genes, when the reverse is actually true.
Mentions: Although genic regions are the most strongly enriched in GWAS, the majority (∼70%) of our hits still fall outside of annotated genes, as defined by their transcriptional start and stop sites. Plotting the distances from non-genic SNPs to the nearest gene on a log scale reveals a bimodal distribution, with a peak at ∼1–5 kb away from genes that is not reflected in the input dataset (Fig. 3). This corresponds with likely positions of promoters and other short-range regulatory elements. Finding enrichment at this scale provides evidence for the high resolution and biological relevance of the GWAS hits in this study. The second peak, which follows the distribution, probably reflects elements that are not correlated with gene distance (e.g., long-range regulatory elements, unannotated transcripts, etc.). For example, using a list of 316 maize noncoding RNAs from Gramene (available at http://ftp.gramene.org/release39/data/fasta/zea_mays/ncrna/) that were not included in the Ensembl annotations reveals that intergenic hits are significantly enriched for polymorphisms within 5 kb of these RNAs (n = 13, expected = 1.07, p = 1.3×10−10 by two-sided exact binomial test). Alternatively, some of these “intergenic” hits may actually be tagging legitimate genes that are simply not present in the reference genome due to the high amount of presence-absence variation in maize [21]. Identifying the nature of these hits should be possible as more information about the maize pan-genome becomes available.

Bottom Line: Phenotypic variation in natural populations results from a combination of genetic effects, environmental effects, and gene-by-environment interactions.We also find that genes tagged by GWAS are enriched for regulatory functions and are ∼ 50% more likely to have a paralog than expected by chance, indicating that gene regulation and gene duplication are strong drivers of phenotypic variation.These results will likely apply to many other organisms, especially ones with large and complex genomes like maize.

View Article: PubMed Central - PubMed

Affiliation: Institute for Genomic Diversity, Cornell University, Ithaca, New York, United States of America.

ABSTRACT
Phenotypic variation in natural populations results from a combination of genetic effects, environmental effects, and gene-by-environment interactions. Despite the vast amount of genomic data becoming available, many pressing questions remain about the nature of genetic mutations that underlie functional variation. We present the results of combining genome-wide association analysis of 41 different phenotypes in ∼ 5,000 inbred maize lines to analyze patterns of high-resolution genetic association among of 28.9 million single-nucleotide polymorphisms (SNPs) and ∼ 800,000 copy-number variants (CNVs). We show that genic and intergenic regions have opposite patterns of enrichment, minor allele frequencies, and effect sizes, implying tradeoffs among the probability that a given polymorphism will have an effect, the detectable size of that effect, and its frequency in the population. We also find that genes tagged by GWAS are enriched for regulatory functions and are ∼ 50% more likely to have a paralog than expected by chance, indicating that gene regulation and gene duplication are strong drivers of phenotypic variation. These results will likely apply to many other organisms, especially ones with large and complex genomes like maize.

No MeSH data available.


Related in: MedlinePlus