Limits...
Signatures of natural selection on genetic variants affecting complex human traits ☆

View Article: PubMed Central - PubMed

ABSTRACT

It has recently been hypothesized that polygenic adaptation, resulting in modest allele frequency changes at many loci, could be a major mechanism behind the adaptation of complex phenotypes in human populations. Here we leverage the large number of variants that have been identified through genome-wide association (GWA) studies to comprehensively study signatures of natural selection on genetic variants associated with complex traits. Using population differentiation based methods, such as FST and phylogenetic branch length analyses, we systematically examined nearly 1300 SNPs associated with 38 complex phenotypes. Instead of detecting selection signatures at individual variants, we aimed to identify combined evidence of natural selection by aggregating signals across many trait associated SNPs. Our results have revealed some general features of polygenic selection on complex traits associated variants. First, natural selection acting on standing variants associated with complex traits is a common phenomenon. Second, characteristics of selection for different polygenic traits vary both temporarily and geographically. Third, some studied traits (e.g. height and urate level) could have been the primary targets of selection, as indicated by the significant correlation between the effect sizes and the estimated strength of selection in the trait associated variants; however, for most traits, the allele frequency changes in trait associated variants might have been driven by the selection on other correlated phenotypes. Fourth, the changes in allele frequencies as a result of selection can be highly stochastic, such that, polygenic adaptation may accelerate differentiation in allele frequencies among populations, but generally does not produce predictable directional changes. Fifth, multiple mechanisms (pleiotropy, hitchhiking, etc) may act together to govern the changes in allele frequencies of genetic variants associated with complex traits.

No MeSH data available.


(A) Population tree (topology) of the nine 1000 Genomes populations.(B) schematic graph of branch length estimation.
© Copyright Policy - CC BY
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5121263&req=5

f0035: (A) Population tree (topology) of the nine 1000 Genomes populations.(B) schematic graph of branch length estimation.

Mentions: To examine whether there are identifiable significant selection signals along specific human lineages, we estimated the branch lengths for the population tree of the nine studied populations (Fig. 1A). We first constructed the tree topology using the neighbor joining (NJ) method (Saitou and Nei, 1987) based on average pairwise FST estimated from 10,000 randomly selected 1000 Genomes SNPs. This topology was consistent with the one based on classic blood group and protein loci (Nei and Roychoudhury, 1993). Given this fixed topology, two methods were used to estimate the branch lengths based on the allele frequencies of each SNP: 1) the Ordinary Least Squares (OLS) method (Chakraborty, 1977, Rzhetsky and Nei, 1992); and 2) a maximum likelihood (ML) method based on the diffusion approximation (Kimura, 1962) of allele frequency changes. Specifically, the first method (OLS) estimates branch lengths by minimizing the squared errors between the observed genetic distances (i.e. pairwise FST between leaf nodes) and the distances over the tree (i.e. the sum of the branch lengths in the path between two leaf nodes). The second method (ML) was motivated by a hierarchical model of allele frequency changes among related populations (Nicholson et al., 2002), which assumes descendent populations diverge and evolve independently from an ancestral population (parental node) and the allele frequency in a descendent population is approximated by a normal distribution. As shown by a simple example (Fig. 1B), the allele frequency of PopB can be written as pB ∼ N[pA, cBpA(1 − pA)], where pA is the allele frequency in the ancestral population (PopA) and cB is the branch length parameter relevant to the demographic history of PopB. Under pure drift setting, , where tB is the number of generations since PopB split from PopA and Ni is the effective population size of each generation. Conditional on the ancestral allele frequencies, the same model applies, independently, to all the non-root populations in a tree. Thus, the full likelihood of a population tree can be written as the product of the likelihoods of every non-root nodes:L=∏iNpi*,cipi1−piwhere i denotes each (non-root) population, pi is the allele frequency of the population and pi⁎ is the allele frequency in the immediate ancestral population (parent node), and ci is the branch length. The branch lengths (ci) and the allele frequencies in ancestral populations were then estimated by numerical maximization of the likelihood function. A detailed description of the method will be reported elsewhere. The branch lengths estimated by these two methods are similar and both can be interpreted as analogous to FST between a population and its hypothetical ancestor (parent node). To reduce the number of comparisons, we focused on the major splits of continental populations (i.e. AFR, EUA [Eurasia], ASN and EUR in Fig. 1A) and aggregated the terminal branches to reflect the average of within continental splits (indicated by EURs, ASNs and AFRs in Fig. 1A). The sum of branch lengths for the entire tree was also calculated.


Signatures of natural selection on genetic variants affecting complex human traits ☆
(A) Population tree (topology) of the nine 1000 Genomes populations.(B) schematic graph of branch length estimation.
© Copyright Policy - CC BY
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5121263&req=5

f0035: (A) Population tree (topology) of the nine 1000 Genomes populations.(B) schematic graph of branch length estimation.
Mentions: To examine whether there are identifiable significant selection signals along specific human lineages, we estimated the branch lengths for the population tree of the nine studied populations (Fig. 1A). We first constructed the tree topology using the neighbor joining (NJ) method (Saitou and Nei, 1987) based on average pairwise FST estimated from 10,000 randomly selected 1000 Genomes SNPs. This topology was consistent with the one based on classic blood group and protein loci (Nei and Roychoudhury, 1993). Given this fixed topology, two methods were used to estimate the branch lengths based on the allele frequencies of each SNP: 1) the Ordinary Least Squares (OLS) method (Chakraborty, 1977, Rzhetsky and Nei, 1992); and 2) a maximum likelihood (ML) method based on the diffusion approximation (Kimura, 1962) of allele frequency changes. Specifically, the first method (OLS) estimates branch lengths by minimizing the squared errors between the observed genetic distances (i.e. pairwise FST between leaf nodes) and the distances over the tree (i.e. the sum of the branch lengths in the path between two leaf nodes). The second method (ML) was motivated by a hierarchical model of allele frequency changes among related populations (Nicholson et al., 2002), which assumes descendent populations diverge and evolve independently from an ancestral population (parental node) and the allele frequency in a descendent population is approximated by a normal distribution. As shown by a simple example (Fig. 1B), the allele frequency of PopB can be written as pB ∼ N[pA, cBpA(1 − pA)], where pA is the allele frequency in the ancestral population (PopA) and cB is the branch length parameter relevant to the demographic history of PopB. Under pure drift setting, , where tB is the number of generations since PopB split from PopA and Ni is the effective population size of each generation. Conditional on the ancestral allele frequencies, the same model applies, independently, to all the non-root populations in a tree. Thus, the full likelihood of a population tree can be written as the product of the likelihoods of every non-root nodes:L=∏iNpi*,cipi1−piwhere i denotes each (non-root) population, pi is the allele frequency of the population and pi⁎ is the allele frequency in the immediate ancestral population (parent node), and ci is the branch length. The branch lengths (ci) and the allele frequencies in ancestral populations were then estimated by numerical maximization of the likelihood function. A detailed description of the method will be reported elsewhere. The branch lengths estimated by these two methods are similar and both can be interpreted as analogous to FST between a population and its hypothetical ancestor (parent node). To reduce the number of comparisons, we focused on the major splits of continental populations (i.e. AFR, EUA [Eurasia], ASN and EUR in Fig. 1A) and aggregated the terminal branches to reflect the average of within continental splits (indicated by EURs, ASNs and AFRs in Fig. 1A). The sum of branch lengths for the entire tree was also calculated.

View Article: PubMed Central - PubMed

ABSTRACT

It has recently been hypothesized that polygenic adaptation, resulting in modest allele frequency changes at many loci, could be a major mechanism behind the adaptation of complex phenotypes in human populations. Here we leverage the large number of variants that have been identified through genome-wide association (GWA) studies to comprehensively study signatures of natural selection on genetic variants associated with complex traits. Using population differentiation based methods, such as FST and phylogenetic branch length analyses, we systematically examined nearly 1300 SNPs associated with 38 complex phenotypes. Instead of detecting selection signatures at individual variants, we aimed to identify combined evidence of natural selection by aggregating signals across many trait associated SNPs. Our results have revealed some general features of polygenic selection on complex traits associated variants. First, natural selection acting on standing variants associated with complex traits is a common phenomenon. Second, characteristics of selection for different polygenic traits vary both temporarily and geographically. Third, some studied traits (e.g. height and urate level) could have been the primary targets of selection, as indicated by the significant correlation between the effect sizes and the estimated strength of selection in the trait associated variants; however, for most traits, the allele frequency changes in trait associated variants might have been driven by the selection on other correlated phenotypes. Fourth, the changes in allele frequencies as a result of selection can be highly stochastic, such that, polygenic adaptation may accelerate differentiation in allele frequencies among populations, but generally does not produce predictable directional changes. Fifth, multiple mechanisms (pleiotropy, hitchhiking, etc) may act together to govern the changes in allele frequencies of genetic variants associated with complex traits.

No MeSH data available.