Limits...
A simple method for analyzing exome sequencing data shows distinct levels of nonsynonymous variation for human immune and nervous system genes.

Freudenberg J, Gregersen PK, Freudenberg-Hua Y - PLoS ONE (2012)

Bottom Line: This parameter can be interpreted as the proportion of nonsynonymous sites where mutations are tolerated to segregate with an allele frequency notably greater than 0 in the population, given the performed normalization of the observed nsSNV to sSNV ratio.A smaller y-intercept is displayed by NSGs, indicating more nonsynonymous sites under strong negative selection.This predicts more monogenically inherited or de-novo mutation diseases that affect the nervous system.

View Article: PubMed Central - PubMed

Affiliation: Robert S. Boas Center for Human Genetics and Genomics, The Feinstein Institute for Medical Research, Northshore LIJ Healthsystem, Manhasset, New York, United States of America. jan.freudenberg@nshs.edu

ABSTRACT
To measure the strength of natural selection that acts upon single nucleotide variants (SNVs) in a set of human genes, we calculate the ratio between nonsynonymous SNVs (nsSNVs) per nonsynonymous site and synonymous SNVs (sSNVs) per synonymous site. We transform this ratio with a respective factor f that corrects for the bias of synonymous sites towards transitions in the genetic code and different mutation rates for transitions and transversions. This method approximates the relative density of nsSNVs (rdnsv) in comparison with the neutral expectation as inferred from the density of sSNVs. Using SNVs from a diploid genome and 200 exomes, we apply our method to immune system genes (ISGs), nervous system genes (NSGs), randomly sampled genes (RSGs), and gene ontology annotated genes. The estimate of rdnsv in an individual exome is around 20% for NSGs and 30-40% for ISGs and RSGs. This smaller rdnsv of NSGs indicates overall stronger purifying selection. To quantify the relative shift of nsSNVs towards rare variants, we next fit a linear regression model to the estimates of rdnsv over different SNV allele frequency bins. The obtained regression models show a negative slope for NSGs, ISGs and RSGs, supporting an influence of purifying selection on the frequency spectrum of segregating nsSNVs. The y-intercept of the model predicts rdnsv for an allele frequency close to 0. This parameter can be interpreted as the proportion of nonsynonymous sites where mutations are tolerated to segregate with an allele frequency notably greater than 0 in the population, given the performed normalization of the observed nsSNV to sSNV ratio. A smaller y-intercept is displayed by NSGs, indicating more nonsynonymous sites under strong negative selection. This predicts more monogenically inherited or de-novo mutation diseases that affect the nervous system.

Show MeSH

Related in: MedlinePlus

Distribution of rdnsv estimates over 200 individual exomes.A) expression-based candidate genes and B) keyword-based candidate genes. The value of rdnsv is estimated separately for each of the 200 exomes and found consistently smaller for NSGs (light grey) are than ISGs (medium grey). In addition, smaller estimates of rdnsv for expression-based ISGs than keyword-based ISGs are seen. No difference exists between expression-based NSGs and keyword-based NSGs.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3368947&req=5

pone-0038087-g002: Distribution of rdnsv estimates over 200 individual exomes.A) expression-based candidate genes and B) keyword-based candidate genes. The value of rdnsv is estimated separately for each of the 200 exomes and found consistently smaller for NSGs (light grey) are than ISGs (medium grey). In addition, smaller estimates of rdnsv for expression-based ISGs than keyword-based ISGs are seen. No difference exists between expression-based NSGs and keyword-based NSGs.

Mentions: To further expand these observation into a larger SNV dataset, we use a published dataset of 200 human exomes [12]. We first separately calculate rdnsv for each of the individual exomes, which shows rdnsv to be roughly normally distributed. The estimates of rdnsv are consistently smaller for NSGs than ISGs (Figure 2). The mean values of the distributions of rdnsv over the 200 individual exomes (20.1% and 29.0% for expression-based NSGs and ISGs and 19.8% and 39.3% for keyword-based NSGs and ISGs) are close to the corresponding rdnsv values from the diploid genome above, despite the fact the diploid genome was obtained under a rather different experimental protocol. Consistent with the diploid genome, we see a greater heterogeneity between expression- and keyword-based ISGs than the two types of NSGs.


A simple method for analyzing exome sequencing data shows distinct levels of nonsynonymous variation for human immune and nervous system genes.

Freudenberg J, Gregersen PK, Freudenberg-Hua Y - PLoS ONE (2012)

Distribution of rdnsv estimates over 200 individual exomes.A) expression-based candidate genes and B) keyword-based candidate genes. The value of rdnsv is estimated separately for each of the 200 exomes and found consistently smaller for NSGs (light grey) are than ISGs (medium grey). In addition, smaller estimates of rdnsv for expression-based ISGs than keyword-based ISGs are seen. No difference exists between expression-based NSGs and keyword-based NSGs.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3368947&req=5

pone-0038087-g002: Distribution of rdnsv estimates over 200 individual exomes.A) expression-based candidate genes and B) keyword-based candidate genes. The value of rdnsv is estimated separately for each of the 200 exomes and found consistently smaller for NSGs (light grey) are than ISGs (medium grey). In addition, smaller estimates of rdnsv for expression-based ISGs than keyword-based ISGs are seen. No difference exists between expression-based NSGs and keyword-based NSGs.
Mentions: To further expand these observation into a larger SNV dataset, we use a published dataset of 200 human exomes [12]. We first separately calculate rdnsv for each of the individual exomes, which shows rdnsv to be roughly normally distributed. The estimates of rdnsv are consistently smaller for NSGs than ISGs (Figure 2). The mean values of the distributions of rdnsv over the 200 individual exomes (20.1% and 29.0% for expression-based NSGs and ISGs and 19.8% and 39.3% for keyword-based NSGs and ISGs) are close to the corresponding rdnsv values from the diploid genome above, despite the fact the diploid genome was obtained under a rather different experimental protocol. Consistent with the diploid genome, we see a greater heterogeneity between expression- and keyword-based ISGs than the two types of NSGs.

Bottom Line: This parameter can be interpreted as the proportion of nonsynonymous sites where mutations are tolerated to segregate with an allele frequency notably greater than 0 in the population, given the performed normalization of the observed nsSNV to sSNV ratio.A smaller y-intercept is displayed by NSGs, indicating more nonsynonymous sites under strong negative selection.This predicts more monogenically inherited or de-novo mutation diseases that affect the nervous system.

View Article: PubMed Central - PubMed

Affiliation: Robert S. Boas Center for Human Genetics and Genomics, The Feinstein Institute for Medical Research, Northshore LIJ Healthsystem, Manhasset, New York, United States of America. jan.freudenberg@nshs.edu

ABSTRACT
To measure the strength of natural selection that acts upon single nucleotide variants (SNVs) in a set of human genes, we calculate the ratio between nonsynonymous SNVs (nsSNVs) per nonsynonymous site and synonymous SNVs (sSNVs) per synonymous site. We transform this ratio with a respective factor f that corrects for the bias of synonymous sites towards transitions in the genetic code and different mutation rates for transitions and transversions. This method approximates the relative density of nsSNVs (rdnsv) in comparison with the neutral expectation as inferred from the density of sSNVs. Using SNVs from a diploid genome and 200 exomes, we apply our method to immune system genes (ISGs), nervous system genes (NSGs), randomly sampled genes (RSGs), and gene ontology annotated genes. The estimate of rdnsv in an individual exome is around 20% for NSGs and 30-40% for ISGs and RSGs. This smaller rdnsv of NSGs indicates overall stronger purifying selection. To quantify the relative shift of nsSNVs towards rare variants, we next fit a linear regression model to the estimates of rdnsv over different SNV allele frequency bins. The obtained regression models show a negative slope for NSGs, ISGs and RSGs, supporting an influence of purifying selection on the frequency spectrum of segregating nsSNVs. The y-intercept of the model predicts rdnsv for an allele frequency close to 0. This parameter can be interpreted as the proportion of nonsynonymous sites where mutations are tolerated to segregate with an allele frequency notably greater than 0 in the population, given the performed normalization of the observed nsSNV to sSNV ratio. A smaller y-intercept is displayed by NSGs, indicating more nonsynonymous sites under strong negative selection. This predicts more monogenically inherited or de-novo mutation diseases that affect the nervous system.

Show MeSH
Related in: MedlinePlus