Limits...
Patterns of positive selection in six Mammalian genomes.

Kosiol C, Vinar T, da Fonseca RR, Hubisz MJ, Bustamante CD, Nielsen R, Siepel A - PLoS Genet. (2008)

Bottom Line: The increased phylogenetic depth of this dataset results in substantially improved statistical power, and permits several new lineage- and clade-specific tests to be applied.A detailed analysis of Affymetrix exon array data indicated that PSGs are expressed at significantly lower levels, and in a more tissue-specific manner, than non-PSGs.Genes that are specifically expressed in the spleen, testes, liver, and breast are significantly enriched for PSGs, but no evidence was found for an enrichment for PSGs among brain-specific genes.

View Article: PubMed Central - PubMed

Affiliation: Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America.

ABSTRACT
Genome-wide scans for positively selected genes (PSGs) in mammals have provided insight into the dynamics of genome evolution, the genetic basis of differences between species, and the functions of individual genes. However, previous scans have been limited in power and accuracy owing to small numbers of available genomes. Here we present the most comprehensive examination of mammalian PSGs to date, using the six high-coverage genome assemblies now available for eutherian mammals. The increased phylogenetic depth of this dataset results in substantially improved statistical power, and permits several new lineage- and clade-specific tests to be applied. Of approximately 16,500 human genes with high-confidence orthologs in at least two other species, 400 genes showed significant evidence of positive selection (FDR<0.05), according to a standard likelihood ratio test. An additional 144 genes showed evidence of positive selection on particular lineages or clades. As in previous studies, the identified PSGs were enriched for roles in defense/immunity, chemosensory perception, and reproduction, but enrichments were also evident for more specific functions, such as complement-mediated immunity and taste perception. Several pathways were strongly enriched for PSGs, suggesting possible co-evolution of interacting genes. A novel Bayesian analysis of the possible "selection histories" of each gene indicated that most PSGs have switched multiple times between positive selection and nonselection, suggesting that positive selection is often episodic. A detailed analysis of Affymetrix exon array data indicated that PSGs are expressed at significantly lower levels, and in a more tissue-specific manner, than non-PSGs. Genes that are specifically expressed in the spleen, testes, liver, and breast are significantly enriched for PSGs, but no evidence was found for an enrichment for PSGs among brain-specific genes. This study provides additional evidence for widespread positive selection in mammalian evolution and new genome-wide insights into the functional implications of positive selection.

Show MeSH

Related in: MedlinePlus

The LRTs used to detect positive selection in the six mammalian genomes.(A–I) Panel A shows the test for selection on any branch of the phylogeny, and panels B–I show the lineage- and clade-specific tests, with branches under positive selection highlighted. The numbers below each subfigure represent the number of positively selected genes identified by each LRT (FDR<0.05) and the total number of ortholog sets tested. In (A), branch lengths are drawn proportional to their estimates in substitutions per site, and each branch is labeled with the corresponding estimate of ω. All tests are based on an unrooted phylogeny; the trees are rooted for display purposes only. Nominal P-value thresholds for FDR<0.05 were: (A) 1.1×10−3, (B) 9.1×10−5, (C) 7.7×10−5, (D) 2.9×10−4, (E) 2.8×10−4, (F) 2.5×10−5, (G) 5.4×10−5, (H) 1.8×10−5, (I) 5.9×10−5.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2483296&req=5

pgen-1000144-g001: The LRTs used to detect positive selection in the six mammalian genomes.(A–I) Panel A shows the test for selection on any branch of the phylogeny, and panels B–I show the lineage- and clade-specific tests, with branches under positive selection highlighted. The numbers below each subfigure represent the number of positively selected genes identified by each LRT (FDR<0.05) and the total number of ortholog sets tested. In (A), branch lengths are drawn proportional to their estimates in substitutions per site, and each branch is labeled with the corresponding estimate of ω. All tests are based on an unrooted phylogeny; the trees are rooted for display purposes only. Nominal P-value thresholds for FDR<0.05 were: (A) 1.1×10−3, (B) 9.1×10−5, (C) 7.7×10−5, (D) 2.9×10−4, (E) 2.8×10−4, (F) 2.5×10−5, (G) 5.4×10−5, (H) 1.8×10−5, (I) 5.9×10−5.

Mentions: For this study, we chose to avoid recently duplicated gene families and to focus on 1∶1 orthologs. This simplified the analysis, allowed for parameter sharing across genes (see Methods), and eliminated an important source of error by avoiding the need for a separate tree reconstruction for each gene family. (All ortholog sets were assumed to obey the species tree shown in Figure 1; because only an unrooted tree is needed, the topology is well accepted.) It was therefore necessary to discard any genes that showed evidence of recent duplication. This was accomplished in a pairwise fashion, by examining each human gene and orthologous non-human gene, and determining—based on BLAST matches to other genes and gene predictions in the same genome—whether either gene had a paralog that was more similar to it than the two orthologs were to each other (see Methods). Requiring that each human gene had a high-confidence 1∶1 ortholog in at least two other species reduced the total number of ortholog sets to 16,529. These sets contain a human gene and either five (42% of cases), four (28%), three (15%) or two (15%) non-human orthologs.


Patterns of positive selection in six Mammalian genomes.

Kosiol C, Vinar T, da Fonseca RR, Hubisz MJ, Bustamante CD, Nielsen R, Siepel A - PLoS Genet. (2008)

The LRTs used to detect positive selection in the six mammalian genomes.(A–I) Panel A shows the test for selection on any branch of the phylogeny, and panels B–I show the lineage- and clade-specific tests, with branches under positive selection highlighted. The numbers below each subfigure represent the number of positively selected genes identified by each LRT (FDR<0.05) and the total number of ortholog sets tested. In (A), branch lengths are drawn proportional to their estimates in substitutions per site, and each branch is labeled with the corresponding estimate of ω. All tests are based on an unrooted phylogeny; the trees are rooted for display purposes only. Nominal P-value thresholds for FDR<0.05 were: (A) 1.1×10−3, (B) 9.1×10−5, (C) 7.7×10−5, (D) 2.9×10−4, (E) 2.8×10−4, (F) 2.5×10−5, (G) 5.4×10−5, (H) 1.8×10−5, (I) 5.9×10−5.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2483296&req=5

pgen-1000144-g001: The LRTs used to detect positive selection in the six mammalian genomes.(A–I) Panel A shows the test for selection on any branch of the phylogeny, and panels B–I show the lineage- and clade-specific tests, with branches under positive selection highlighted. The numbers below each subfigure represent the number of positively selected genes identified by each LRT (FDR<0.05) and the total number of ortholog sets tested. In (A), branch lengths are drawn proportional to their estimates in substitutions per site, and each branch is labeled with the corresponding estimate of ω. All tests are based on an unrooted phylogeny; the trees are rooted for display purposes only. Nominal P-value thresholds for FDR<0.05 were: (A) 1.1×10−3, (B) 9.1×10−5, (C) 7.7×10−5, (D) 2.9×10−4, (E) 2.8×10−4, (F) 2.5×10−5, (G) 5.4×10−5, (H) 1.8×10−5, (I) 5.9×10−5.
Mentions: For this study, we chose to avoid recently duplicated gene families and to focus on 1∶1 orthologs. This simplified the analysis, allowed for parameter sharing across genes (see Methods), and eliminated an important source of error by avoiding the need for a separate tree reconstruction for each gene family. (All ortholog sets were assumed to obey the species tree shown in Figure 1; because only an unrooted tree is needed, the topology is well accepted.) It was therefore necessary to discard any genes that showed evidence of recent duplication. This was accomplished in a pairwise fashion, by examining each human gene and orthologous non-human gene, and determining—based on BLAST matches to other genes and gene predictions in the same genome—whether either gene had a paralog that was more similar to it than the two orthologs were to each other (see Methods). Requiring that each human gene had a high-confidence 1∶1 ortholog in at least two other species reduced the total number of ortholog sets to 16,529. These sets contain a human gene and either five (42% of cases), four (28%), three (15%) or two (15%) non-human orthologs.

Bottom Line: The increased phylogenetic depth of this dataset results in substantially improved statistical power, and permits several new lineage- and clade-specific tests to be applied.A detailed analysis of Affymetrix exon array data indicated that PSGs are expressed at significantly lower levels, and in a more tissue-specific manner, than non-PSGs.Genes that are specifically expressed in the spleen, testes, liver, and breast are significantly enriched for PSGs, but no evidence was found for an enrichment for PSGs among brain-specific genes.

View Article: PubMed Central - PubMed

Affiliation: Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America.

ABSTRACT
Genome-wide scans for positively selected genes (PSGs) in mammals have provided insight into the dynamics of genome evolution, the genetic basis of differences between species, and the functions of individual genes. However, previous scans have been limited in power and accuracy owing to small numbers of available genomes. Here we present the most comprehensive examination of mammalian PSGs to date, using the six high-coverage genome assemblies now available for eutherian mammals. The increased phylogenetic depth of this dataset results in substantially improved statistical power, and permits several new lineage- and clade-specific tests to be applied. Of approximately 16,500 human genes with high-confidence orthologs in at least two other species, 400 genes showed significant evidence of positive selection (FDR<0.05), according to a standard likelihood ratio test. An additional 144 genes showed evidence of positive selection on particular lineages or clades. As in previous studies, the identified PSGs were enriched for roles in defense/immunity, chemosensory perception, and reproduction, but enrichments were also evident for more specific functions, such as complement-mediated immunity and taste perception. Several pathways were strongly enriched for PSGs, suggesting possible co-evolution of interacting genes. A novel Bayesian analysis of the possible "selection histories" of each gene indicated that most PSGs have switched multiple times between positive selection and nonselection, suggesting that positive selection is often episodic. A detailed analysis of Affymetrix exon array data indicated that PSGs are expressed at significantly lower levels, and in a more tissue-specific manner, than non-PSGs. Genes that are specifically expressed in the spleen, testes, liver, and breast are significantly enriched for PSGs, but no evidence was found for an enrichment for PSGs among brain-specific genes. This study provides additional evidence for widespread positive selection in mammalian evolution and new genome-wide insights into the functional implications of positive selection.

Show MeSH
Related in: MedlinePlus