Limits...
Analysis of sequence conservation at nucleotide resolution.

Asthana S, Roytberg M, Stamatoyannopoulos J, Sunyaev S - PLoS Comput. Biol. (2007)

Bottom Line: Our approach estimates the rate at which each nucleotide position is evolving, computes the probability of neutrality given this rate estimate, and summarizes the result in a Sequence CONservation Evaluation (SCONE) score.These regions are markedly enriched in individually conserved positions and short (<15 bp) conserved "chunks." Our results collectively suggest that the majority of functionally important noncoding conserved positions are highly fragmented and reside outside of canonically defined long conserved noncoding sequences.A small subset of these fragmented positions may be identified with high confidence.

View Article: PubMed Central - PubMed

Affiliation: Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America.

ABSTRACT
One of the major goals of comparative genomics is to understand the evolutionary history of each nucleotide in the human genome sequence, and the degree to which it is under selective pressure. Ascertainment of selective constraint at nucleotide resolution is particularly important for predicting the functional significance of human genetic variation and for analyzing the sequence substructure of cis-regulatory sequences and other functional elements. Current methods for analysis of sequence conservation are focused on delineation of conserved regions comprising tens or even hundreds of consecutive nucleotides. We therefore developed a novel computational approach designed specifically for scoring evolutionary conservation at individual base-pair resolution. Our approach estimates the rate at which each nucleotide position is evolving, computes the probability of neutrality given this rate estimate, and summarizes the result in a Sequence CONservation Evaluation (SCONE) score. We computed SCONE scores in a continuous fashion across 1% of the human genome for which high-quality sequence information from up to 23 genomes are available. We show that SCONE scores are clearly correlated with the allele frequency of human polymorphisms in both coding and noncoding regions. We find that the majority of noncoding conserved nucleotides lie outside of longer conserved elements predicted by other conservation analyses, and are experiencing ongoing selection in modern humans as evident from the allele frequency spectrum of human polymorphism. We also applied SCONE to analyze the distribution of conserved nucleotides within functional regions. These regions are markedly enriched in individually conserved positions and short (<15 bp) conserved "chunks." Our results collectively suggest that the majority of functionally important noncoding conserved positions are highly fragmented and reside outside of canonically defined long conserved noncoding sequences. A small subset of these fragmented positions may be identified with high confidence.

Show MeSH

Related in: MedlinePlus

Rare Derived Allele Frequency in Conserved versus Nonconserved SitesPositions are partitioned according to (i) ENCODE MCS elements for all ENCODE positions, (ii) SCONE conservation score for all ENCODE positions, and (iii) SCONE conservation score for all ENCODE positions outside of MCS elements. p-Values are calculated using Fisher's exact test.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2230682&req=5

pcbi-0030254-g002: Rare Derived Allele Frequency in Conserved versus Nonconserved SitesPositions are partitioned according to (i) ENCODE MCS elements for all ENCODE positions, (ii) SCONE conservation score for all ENCODE positions, and (iii) SCONE conservation score for all ENCODE positions outside of MCS elements. p-Values are calculated using Fisher's exact test.

Mentions: We detect a significant difference (p < 0.0004, Fisher exact test) in the fraction of rare derived alleles (Figure 2) between conserved (SCONE p-value < 0.005, Fisher exact test) and nonconserved noncoding positions. The higher fraction of rare derived alleles in conserved (slowly evolving) positions indicates that these positions are experiencing purifying selection. Because allele frequency distributions are unaffected by mutation rate heterogeneity, our results suggest that this effect is due to sites that are evolving slowly due to selection rather than merely due to chance. For comparison, we examined the allele frequency distribution in noncoding conserved sequence regions, using the ENCODE multispecies conserved sequence (MCS) element set to define contiguous conserved elements. These were defined on the basis of agreement between at least two out of three regional conservation scores (phastCons, BinCons, and GERP) that identify regions of sequence with elevated average conservation. The shift in allele frequency distributions is stronger for SCONE-conserved positions than it is for MCS elements (p < 0.05, Fisher exact test), suggesting that these positions are either enriched for functional positions compared to MCS elements, or are on average under stronger selection.


Analysis of sequence conservation at nucleotide resolution.

Asthana S, Roytberg M, Stamatoyannopoulos J, Sunyaev S - PLoS Comput. Biol. (2007)

Rare Derived Allele Frequency in Conserved versus Nonconserved SitesPositions are partitioned according to (i) ENCODE MCS elements for all ENCODE positions, (ii) SCONE conservation score for all ENCODE positions, and (iii) SCONE conservation score for all ENCODE positions outside of MCS elements. p-Values are calculated using Fisher's exact test.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2230682&req=5

pcbi-0030254-g002: Rare Derived Allele Frequency in Conserved versus Nonconserved SitesPositions are partitioned according to (i) ENCODE MCS elements for all ENCODE positions, (ii) SCONE conservation score for all ENCODE positions, and (iii) SCONE conservation score for all ENCODE positions outside of MCS elements. p-Values are calculated using Fisher's exact test.
Mentions: We detect a significant difference (p < 0.0004, Fisher exact test) in the fraction of rare derived alleles (Figure 2) between conserved (SCONE p-value < 0.005, Fisher exact test) and nonconserved noncoding positions. The higher fraction of rare derived alleles in conserved (slowly evolving) positions indicates that these positions are experiencing purifying selection. Because allele frequency distributions are unaffected by mutation rate heterogeneity, our results suggest that this effect is due to sites that are evolving slowly due to selection rather than merely due to chance. For comparison, we examined the allele frequency distribution in noncoding conserved sequence regions, using the ENCODE multispecies conserved sequence (MCS) element set to define contiguous conserved elements. These were defined on the basis of agreement between at least two out of three regional conservation scores (phastCons, BinCons, and GERP) that identify regions of sequence with elevated average conservation. The shift in allele frequency distributions is stronger for SCONE-conserved positions than it is for MCS elements (p < 0.05, Fisher exact test), suggesting that these positions are either enriched for functional positions compared to MCS elements, or are on average under stronger selection.

Bottom Line: Our approach estimates the rate at which each nucleotide position is evolving, computes the probability of neutrality given this rate estimate, and summarizes the result in a Sequence CONservation Evaluation (SCONE) score.These regions are markedly enriched in individually conserved positions and short (<15 bp) conserved "chunks." Our results collectively suggest that the majority of functionally important noncoding conserved positions are highly fragmented and reside outside of canonically defined long conserved noncoding sequences.A small subset of these fragmented positions may be identified with high confidence.

View Article: PubMed Central - PubMed

Affiliation: Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America.

ABSTRACT
One of the major goals of comparative genomics is to understand the evolutionary history of each nucleotide in the human genome sequence, and the degree to which it is under selective pressure. Ascertainment of selective constraint at nucleotide resolution is particularly important for predicting the functional significance of human genetic variation and for analyzing the sequence substructure of cis-regulatory sequences and other functional elements. Current methods for analysis of sequence conservation are focused on delineation of conserved regions comprising tens or even hundreds of consecutive nucleotides. We therefore developed a novel computational approach designed specifically for scoring evolutionary conservation at individual base-pair resolution. Our approach estimates the rate at which each nucleotide position is evolving, computes the probability of neutrality given this rate estimate, and summarizes the result in a Sequence CONservation Evaluation (SCONE) score. We computed SCONE scores in a continuous fashion across 1% of the human genome for which high-quality sequence information from up to 23 genomes are available. We show that SCONE scores are clearly correlated with the allele frequency of human polymorphisms in both coding and noncoding regions. We find that the majority of noncoding conserved nucleotides lie outside of longer conserved elements predicted by other conservation analyses, and are experiencing ongoing selection in modern humans as evident from the allele frequency spectrum of human polymorphism. We also applied SCONE to analyze the distribution of conserved nucleotides within functional regions. These regions are markedly enriched in individually conserved positions and short (<15 bp) conserved "chunks." Our results collectively suggest that the majority of functionally important noncoding conserved positions are highly fragmented and reside outside of canonically defined long conserved noncoding sequences. A small subset of these fragmented positions may be identified with high confidence.

Show MeSH
Related in: MedlinePlus