Limits...
From mouse to human: evolutionary genomics analysis of human orthologs of essential genes.

Georgi B, Voight BF, Bućan M - PLoS Genet. (2013)

Bottom Line: Studies in model organisms identified a significant fraction of essential genes through the analysis of -mutations that lead to lethality.Consistent with the action of strong, purifying selection, these genes exhibit comparatively reduced levels of sequence variation, skew in allele frequency towards more rare, and exhibit increased conservation across the primate and rodent lineages relative to the remainder of genes in the genome.While incomplete, our set of human orthologs shows characteristics fully consistent with essential function in human and thus provides a resource to inform and facilitate interpretation of sequence data in studies of human disease.

View Article: PubMed Central - PubMed

Affiliation: Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.

ABSTRACT
Understanding the core set of genes that are necessary for basic developmental functions is one of the central goals in biology. Studies in model organisms identified a significant fraction of essential genes through the analysis of -mutations that lead to lethality. Recent large-scale next-generation sequencing efforts have provided unprecedented data on genetic variation in human. However, evolutionary and genomic characteristics of human essential genes have never been directly studied on a genome-wide scale. Here we use detailed phenotypic resources available for the mouse and deep genomics sequencing data from human populations to characterize patterns of genetic variation and mutational burden in a set of 2,472 human orthologs of known essential genes in the mouse. Consistent with the action of strong, purifying selection, these genes exhibit comparatively reduced levels of sequence variation, skew in allele frequency towards more rare, and exhibit increased conservation across the primate and rodent lineages relative to the remainder of genes in the genome. In individual genomes we observed ~12 rare mutations within essential genes predicted to be damaging. Consistent with the hypothesis that mutations in essential genes are risk factors for neurodevelopmental disease, we show that de novo variants in patients with Autism Spectrum Disorder are more likely to occur in this collection of genes. While incomplete, our set of human orthologs shows characteristics fully consistent with essential function in human and thus provides a resource to inform and facilitate interpretation of sequence data in studies of human disease.

Show MeSH

Related in: MedlinePlus

Analysis of individual mutational load in essential genes.The boxes span the lower and upper quartile with the median indicated by a red bar; whiskers extend to data points less than 1.5 times the interquartile range. Values are transformed to Z-scores relative to the genome average of all protein coding genes. The P-values given are for the comparison of EG versus NLG (top) and EG versus ALL (bottom). A) Ratio of non-synonymous to synonymous exonic variants. B) Gene-length corrected average number of exonic missense variants. C) Fraction of loss-of-function variants among all exonic missense variants. D) Estimates of mutational load in essential genes in each human genome at different allele frequencies. The plots show all exonic missense variants (blue), putative damaging exonic variants (orange) and loss-of-function variants (red). Error bars depict the standard deviation.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3649967&req=5

pgen-1003484-g003: Analysis of individual mutational load in essential genes.The boxes span the lower and upper quartile with the median indicated by a red bar; whiskers extend to data points less than 1.5 times the interquartile range. Values are transformed to Z-scores relative to the genome average of all protein coding genes. The P-values given are for the comparison of EG versus NLG (top) and EG versus ALL (bottom). A) Ratio of non-synonymous to synonymous exonic variants. B) Gene-length corrected average number of exonic missense variants. C) Fraction of loss-of-function variants among all exonic missense variants. D) Estimates of mutational load in essential genes in each human genome at different allele frequencies. The plots show all exonic missense variants (blue), putative damaging exonic variants (orange) and loss-of-function variants (red). Error bars depict the standard deviation.

Mentions: Under the model that a subset of mutations in essential genes are subject to purifying selection at a population level, we hypothesized that across the set of essential genes, individual genomes should also exhibit reduced mutational load. When comparing the mutational load in essential genes for each sample in the 1000 Genomes Phase 1 data, we observed a significant reduction in the ratio of non-synonymous to synonymous substitution within EG compared to NLG (Wilcoxon P = 1.66×10−180, Figure 3A), as well as an overall reduction in the number of missense variants (Wilcoxon P = 1.66×10−180, Figure 3B). The observed constraint on polymorphisms in essential genes suggests a higher rate of deleterious mutations removed from the population by background selection, and thus, a lower incidence of severe effect variants in each individual genome. To test this hypothesis, we investigated the difference in the relative abundance of loss-of-function (or LoF variants), i.e. variants that introduce or disrupt a stop codon (nonsense or read-through) or splicing sites. The comparison of the fraction of LoF variants among all exonic non-synonymous SNPs within each group showed a significantly lower fraction of LoF events in EG compared to NLG (P = 6.38×10−161 paired Wilcoxon test, Figure 3C). In fact, only 11% (122) of samples did have a LoF event for any gene in EG, compared to 96% (1045) of samples for the NLG set. Similarly, we observed a striking, almost 5-fold increase in the ratio of heterozygous to homozygous LoF variants within EG (22.25) to NLG (4.53) (Wilxocon P = 3.71×10−111). These findings are consistent with data from a recent study of LoF variants in 185 human genomes [9]. From the 1,102 genes reported to be hit by high confidence LoF variants, 190 belonged to either the EG or NLG classes. We observed a depletion of high confidence LoF variants within EG versus NLG (Fisher's exact test P = 0.012, OR = 0.67, 95% CI 0.48–0.92, Table S2D). Thus, there is not only an overall reduction in the number of exonic missense variants in EG, but the variants that are present also tend to be less severe. To provide additional technical validation of our observations, we repeated the analysis using the whole-genome sequence of 54 HapMap individuals made available by Complete Genomics ([29], http://www.completegenomics.com/public-data/69-Genomes/) and observed comparable results (see Text S1, Figures S9 and S10, Tables S9 and S10).


From mouse to human: evolutionary genomics analysis of human orthologs of essential genes.

Georgi B, Voight BF, Bućan M - PLoS Genet. (2013)

Analysis of individual mutational load in essential genes.The boxes span the lower and upper quartile with the median indicated by a red bar; whiskers extend to data points less than 1.5 times the interquartile range. Values are transformed to Z-scores relative to the genome average of all protein coding genes. The P-values given are for the comparison of EG versus NLG (top) and EG versus ALL (bottom). A) Ratio of non-synonymous to synonymous exonic variants. B) Gene-length corrected average number of exonic missense variants. C) Fraction of loss-of-function variants among all exonic missense variants. D) Estimates of mutational load in essential genes in each human genome at different allele frequencies. The plots show all exonic missense variants (blue), putative damaging exonic variants (orange) and loss-of-function variants (red). Error bars depict the standard deviation.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3649967&req=5

pgen-1003484-g003: Analysis of individual mutational load in essential genes.The boxes span the lower and upper quartile with the median indicated by a red bar; whiskers extend to data points less than 1.5 times the interquartile range. Values are transformed to Z-scores relative to the genome average of all protein coding genes. The P-values given are for the comparison of EG versus NLG (top) and EG versus ALL (bottom). A) Ratio of non-synonymous to synonymous exonic variants. B) Gene-length corrected average number of exonic missense variants. C) Fraction of loss-of-function variants among all exonic missense variants. D) Estimates of mutational load in essential genes in each human genome at different allele frequencies. The plots show all exonic missense variants (blue), putative damaging exonic variants (orange) and loss-of-function variants (red). Error bars depict the standard deviation.
Mentions: Under the model that a subset of mutations in essential genes are subject to purifying selection at a population level, we hypothesized that across the set of essential genes, individual genomes should also exhibit reduced mutational load. When comparing the mutational load in essential genes for each sample in the 1000 Genomes Phase 1 data, we observed a significant reduction in the ratio of non-synonymous to synonymous substitution within EG compared to NLG (Wilcoxon P = 1.66×10−180, Figure 3A), as well as an overall reduction in the number of missense variants (Wilcoxon P = 1.66×10−180, Figure 3B). The observed constraint on polymorphisms in essential genes suggests a higher rate of deleterious mutations removed from the population by background selection, and thus, a lower incidence of severe effect variants in each individual genome. To test this hypothesis, we investigated the difference in the relative abundance of loss-of-function (or LoF variants), i.e. variants that introduce or disrupt a stop codon (nonsense or read-through) or splicing sites. The comparison of the fraction of LoF variants among all exonic non-synonymous SNPs within each group showed a significantly lower fraction of LoF events in EG compared to NLG (P = 6.38×10−161 paired Wilcoxon test, Figure 3C). In fact, only 11% (122) of samples did have a LoF event for any gene in EG, compared to 96% (1045) of samples for the NLG set. Similarly, we observed a striking, almost 5-fold increase in the ratio of heterozygous to homozygous LoF variants within EG (22.25) to NLG (4.53) (Wilxocon P = 3.71×10−111). These findings are consistent with data from a recent study of LoF variants in 185 human genomes [9]. From the 1,102 genes reported to be hit by high confidence LoF variants, 190 belonged to either the EG or NLG classes. We observed a depletion of high confidence LoF variants within EG versus NLG (Fisher's exact test P = 0.012, OR = 0.67, 95% CI 0.48–0.92, Table S2D). Thus, there is not only an overall reduction in the number of exonic missense variants in EG, but the variants that are present also tend to be less severe. To provide additional technical validation of our observations, we repeated the analysis using the whole-genome sequence of 54 HapMap individuals made available by Complete Genomics ([29], http://www.completegenomics.com/public-data/69-Genomes/) and observed comparable results (see Text S1, Figures S9 and S10, Tables S9 and S10).

Bottom Line: Studies in model organisms identified a significant fraction of essential genes through the analysis of -mutations that lead to lethality.Consistent with the action of strong, purifying selection, these genes exhibit comparatively reduced levels of sequence variation, skew in allele frequency towards more rare, and exhibit increased conservation across the primate and rodent lineages relative to the remainder of genes in the genome.While incomplete, our set of human orthologs shows characteristics fully consistent with essential function in human and thus provides a resource to inform and facilitate interpretation of sequence data in studies of human disease.

View Article: PubMed Central - PubMed

Affiliation: Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.

ABSTRACT
Understanding the core set of genes that are necessary for basic developmental functions is one of the central goals in biology. Studies in model organisms identified a significant fraction of essential genes through the analysis of -mutations that lead to lethality. Recent large-scale next-generation sequencing efforts have provided unprecedented data on genetic variation in human. However, evolutionary and genomic characteristics of human essential genes have never been directly studied on a genome-wide scale. Here we use detailed phenotypic resources available for the mouse and deep genomics sequencing data from human populations to characterize patterns of genetic variation and mutational burden in a set of 2,472 human orthologs of known essential genes in the mouse. Consistent with the action of strong, purifying selection, these genes exhibit comparatively reduced levels of sequence variation, skew in allele frequency towards more rare, and exhibit increased conservation across the primate and rodent lineages relative to the remainder of genes in the genome. In individual genomes we observed ~12 rare mutations within essential genes predicted to be damaging. Consistent with the hypothesis that mutations in essential genes are risk factors for neurodevelopmental disease, we show that de novo variants in patients with Autism Spectrum Disorder are more likely to occur in this collection of genes. While incomplete, our set of human orthologs shows characteristics fully consistent with essential function in human and thus provides a resource to inform and facilitate interpretation of sequence data in studies of human disease.

Show MeSH
Related in: MedlinePlus