Limits...
Patterns of positive selection in six Mammalian genomes.

Kosiol C, Vinar T, da Fonseca RR, Hubisz MJ, Bustamante CD, Nielsen R, Siepel A - PLoS Genet. (2008)

Bottom Line: The increased phylogenetic depth of this dataset results in substantially improved statistical power, and permits several new lineage- and clade-specific tests to be applied.A detailed analysis of Affymetrix exon array data indicated that PSGs are expressed at significantly lower levels, and in a more tissue-specific manner, than non-PSGs.Genes that are specifically expressed in the spleen, testes, liver, and breast are significantly enriched for PSGs, but no evidence was found for an enrichment for PSGs among brain-specific genes.

View Article: PubMed Central - PubMed

Affiliation: Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America.

ABSTRACT
Genome-wide scans for positively selected genes (PSGs) in mammals have provided insight into the dynamics of genome evolution, the genetic basis of differences between species, and the functions of individual genes. However, previous scans have been limited in power and accuracy owing to small numbers of available genomes. Here we present the most comprehensive examination of mammalian PSGs to date, using the six high-coverage genome assemblies now available for eutherian mammals. The increased phylogenetic depth of this dataset results in substantially improved statistical power, and permits several new lineage- and clade-specific tests to be applied. Of approximately 16,500 human genes with high-confidence orthologs in at least two other species, 400 genes showed significant evidence of positive selection (FDR<0.05), according to a standard likelihood ratio test. An additional 144 genes showed evidence of positive selection on particular lineages or clades. As in previous studies, the identified PSGs were enriched for roles in defense/immunity, chemosensory perception, and reproduction, but enrichments were also evident for more specific functions, such as complement-mediated immunity and taste perception. Several pathways were strongly enriched for PSGs, suggesting possible co-evolution of interacting genes. A novel Bayesian analysis of the possible "selection histories" of each gene indicated that most PSGs have switched multiple times between positive selection and nonselection, suggesting that positive selection is often episodic. A detailed analysis of Affymetrix exon array data indicated that PSGs are expressed at significantly lower levels, and in a more tissue-specific manner, than non-PSGs. Genes that are specifically expressed in the spleen, testes, liver, and breast are significantly enriched for PSGs, but no evidence was found for an enrichment for PSGs among brain-specific genes. This study provides additional evidence for widespread positive selection in mammalian evolution and new genome-wide insights into the functional implications of positive selection.

Show MeSH

Related in: MedlinePlus

Power of the LRT for selection on any branch of the phylogeny as a function of the nonsynonymous-synonymous rate ratio ω.Power is defined as the fraction of tests resulting in nominal P<0.05. (The effect of controlling for multiple comparisons is shown in Figure S3.) When ω≤1, these fractions are estimates of the false positive rate. Each data point is based on 1000 data sets simulated with evolver[84] under the assumption of a constant ω among lineages and among sites (model M0). All other parameters (including the transition-transversion ratio κ, the codon frequencies, and the branch lengths) were fixed at values estimated from the real data. Results are shown for short (200-codon) and long (500-codon) genes and three sets of species: hominids (human and chimpanzee), primates (human, chimpanzee, and macaque), and all six mammals. Details on the computation of P-values are given in Text S1. Note the logarithmic scale on the x-axis.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2483296&req=5

pgen-1000144-g006: Power of the LRT for selection on any branch of the phylogeny as a function of the nonsynonymous-synonymous rate ratio ω.Power is defined as the fraction of tests resulting in nominal P<0.05. (The effect of controlling for multiple comparisons is shown in Figure S3.) When ω≤1, these fractions are estimates of the false positive rate. Each data point is based on 1000 data sets simulated with evolver[84] under the assumption of a constant ω among lineages and among sites (model M0). All other parameters (including the transition-transversion ratio κ, the codon frequencies, and the branch lengths) were fixed at values estimated from the real data. Results are shown for short (200-codon) and long (500-codon) genes and three sets of species: hominids (human and chimpanzee), primates (human, chimpanzee, and macaque), and all six mammals. Details on the computation of P-values are given in Text S1. Note the logarithmic scale on the x-axis.

Mentions: To compare the power of our LRTs with the power of previous tests based on hominid or primate genomes, we simulated data sets under a range of parameter values and measured the fraction of cases in which positive selection was predicted (Figure 6). These experiments show that power increases substantially when the set of species under consideration is expanded from the two hominid species to the three primates then to all six mammals. With hominid species only, power is poor even when selection is quite strong (e.g., ∼20% with a constant ω = 2 and ∼40% with ω = 4), suggesting that a genome-wide scan will tend to identify only the most extreme cases of positive selection. If a rigorous correction for multiple testing is applied, a test based on hominids only has essentially no power, even for fairly long genes under strong selection (Figure S3; see also [5]). The situation is considerably improved by the addition of the macaque genome, but power remains poor when controlling for multiple testing unless genes are long and selection is strong. When all six mammals are considered, however, power increases substantially. With the full data set, power is reasonably good (≥70%) even when genes are short and selection is moderate in strength; it remains good when multiple comparisons are considered (Figure S3). The absolute estimates of power from these experiments depend on the simplifying assumptions used in the simulations (including the unrealistic assumption of constant ω among lineages and among sites), and they must be interpreted cautiously. However, estimates of relative power—which will be less sensitive to these simplifying assumptions—indicate a substantial improvement is achieved by the addition of the three non-primate mammals.


Patterns of positive selection in six Mammalian genomes.

Kosiol C, Vinar T, da Fonseca RR, Hubisz MJ, Bustamante CD, Nielsen R, Siepel A - PLoS Genet. (2008)

Power of the LRT for selection on any branch of the phylogeny as a function of the nonsynonymous-synonymous rate ratio ω.Power is defined as the fraction of tests resulting in nominal P<0.05. (The effect of controlling for multiple comparisons is shown in Figure S3.) When ω≤1, these fractions are estimates of the false positive rate. Each data point is based on 1000 data sets simulated with evolver[84] under the assumption of a constant ω among lineages and among sites (model M0). All other parameters (including the transition-transversion ratio κ, the codon frequencies, and the branch lengths) were fixed at values estimated from the real data. Results are shown for short (200-codon) and long (500-codon) genes and three sets of species: hominids (human and chimpanzee), primates (human, chimpanzee, and macaque), and all six mammals. Details on the computation of P-values are given in Text S1. Note the logarithmic scale on the x-axis.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2483296&req=5

pgen-1000144-g006: Power of the LRT for selection on any branch of the phylogeny as a function of the nonsynonymous-synonymous rate ratio ω.Power is defined as the fraction of tests resulting in nominal P<0.05. (The effect of controlling for multiple comparisons is shown in Figure S3.) When ω≤1, these fractions are estimates of the false positive rate. Each data point is based on 1000 data sets simulated with evolver[84] under the assumption of a constant ω among lineages and among sites (model M0). All other parameters (including the transition-transversion ratio κ, the codon frequencies, and the branch lengths) were fixed at values estimated from the real data. Results are shown for short (200-codon) and long (500-codon) genes and three sets of species: hominids (human and chimpanzee), primates (human, chimpanzee, and macaque), and all six mammals. Details on the computation of P-values are given in Text S1. Note the logarithmic scale on the x-axis.
Mentions: To compare the power of our LRTs with the power of previous tests based on hominid or primate genomes, we simulated data sets under a range of parameter values and measured the fraction of cases in which positive selection was predicted (Figure 6). These experiments show that power increases substantially when the set of species under consideration is expanded from the two hominid species to the three primates then to all six mammals. With hominid species only, power is poor even when selection is quite strong (e.g., ∼20% with a constant ω = 2 and ∼40% with ω = 4), suggesting that a genome-wide scan will tend to identify only the most extreme cases of positive selection. If a rigorous correction for multiple testing is applied, a test based on hominids only has essentially no power, even for fairly long genes under strong selection (Figure S3; see also [5]). The situation is considerably improved by the addition of the macaque genome, but power remains poor when controlling for multiple testing unless genes are long and selection is strong. When all six mammals are considered, however, power increases substantially. With the full data set, power is reasonably good (≥70%) even when genes are short and selection is moderate in strength; it remains good when multiple comparisons are considered (Figure S3). The absolute estimates of power from these experiments depend on the simplifying assumptions used in the simulations (including the unrealistic assumption of constant ω among lineages and among sites), and they must be interpreted cautiously. However, estimates of relative power—which will be less sensitive to these simplifying assumptions—indicate a substantial improvement is achieved by the addition of the three non-primate mammals.

Bottom Line: The increased phylogenetic depth of this dataset results in substantially improved statistical power, and permits several new lineage- and clade-specific tests to be applied.A detailed analysis of Affymetrix exon array data indicated that PSGs are expressed at significantly lower levels, and in a more tissue-specific manner, than non-PSGs.Genes that are specifically expressed in the spleen, testes, liver, and breast are significantly enriched for PSGs, but no evidence was found for an enrichment for PSGs among brain-specific genes.

View Article: PubMed Central - PubMed

Affiliation: Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America.

ABSTRACT
Genome-wide scans for positively selected genes (PSGs) in mammals have provided insight into the dynamics of genome evolution, the genetic basis of differences between species, and the functions of individual genes. However, previous scans have been limited in power and accuracy owing to small numbers of available genomes. Here we present the most comprehensive examination of mammalian PSGs to date, using the six high-coverage genome assemblies now available for eutherian mammals. The increased phylogenetic depth of this dataset results in substantially improved statistical power, and permits several new lineage- and clade-specific tests to be applied. Of approximately 16,500 human genes with high-confidence orthologs in at least two other species, 400 genes showed significant evidence of positive selection (FDR<0.05), according to a standard likelihood ratio test. An additional 144 genes showed evidence of positive selection on particular lineages or clades. As in previous studies, the identified PSGs were enriched for roles in defense/immunity, chemosensory perception, and reproduction, but enrichments were also evident for more specific functions, such as complement-mediated immunity and taste perception. Several pathways were strongly enriched for PSGs, suggesting possible co-evolution of interacting genes. A novel Bayesian analysis of the possible "selection histories" of each gene indicated that most PSGs have switched multiple times between positive selection and nonselection, suggesting that positive selection is often episodic. A detailed analysis of Affymetrix exon array data indicated that PSGs are expressed at significantly lower levels, and in a more tissue-specific manner, than non-PSGs. Genes that are specifically expressed in the spleen, testes, liver, and breast are significantly enriched for PSGs, but no evidence was found for an enrichment for PSGs among brain-specific genes. This study provides additional evidence for widespread positive selection in mammalian evolution and new genome-wide insights into the functional implications of positive selection.

Show MeSH
Related in: MedlinePlus