Limits...
Analysis of sequence conservation at nucleotide resolution.

Asthana S, Roytberg M, Stamatoyannopoulos J, Sunyaev S - PLoS Comput. Biol. (2007)

Bottom Line: Our approach estimates the rate at which each nucleotide position is evolving, computes the probability of neutrality given this rate estimate, and summarizes the result in a Sequence CONservation Evaluation (SCONE) score.These regions are markedly enriched in individually conserved positions and short (<15 bp) conserved "chunks." Our results collectively suggest that the majority of functionally important noncoding conserved positions are highly fragmented and reside outside of canonically defined long conserved noncoding sequences.A small subset of these fragmented positions may be identified with high confidence.

View Article: PubMed Central - PubMed

Affiliation: Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America.

ABSTRACT
One of the major goals of comparative genomics is to understand the evolutionary history of each nucleotide in the human genome sequence, and the degree to which it is under selective pressure. Ascertainment of selective constraint at nucleotide resolution is particularly important for predicting the functional significance of human genetic variation and for analyzing the sequence substructure of cis-regulatory sequences and other functional elements. Current methods for analysis of sequence conservation are focused on delineation of conserved regions comprising tens or even hundreds of consecutive nucleotides. We therefore developed a novel computational approach designed specifically for scoring evolutionary conservation at individual base-pair resolution. Our approach estimates the rate at which each nucleotide position is evolving, computes the probability of neutrality given this rate estimate, and summarizes the result in a Sequence CONservation Evaluation (SCONE) score. We computed SCONE scores in a continuous fashion across 1% of the human genome for which high-quality sequence information from up to 23 genomes are available. We show that SCONE scores are clearly correlated with the allele frequency of human polymorphisms in both coding and noncoding regions. We find that the majority of noncoding conserved nucleotides lie outside of longer conserved elements predicted by other conservation analyses, and are experiencing ongoing selection in modern humans as evident from the allele frequency spectrum of human polymorphism. We also applied SCONE to analyze the distribution of conserved nucleotides within functional regions. These regions are markedly enriched in individually conserved positions and short (<15 bp) conserved "chunks." Our results collectively suggest that the majority of functionally important noncoding conserved positions are highly fragmented and reside outside of canonically defined long conserved noncoding sequences. A small subset of these fragmented positions may be identified with high confidence.

Show MeSH

Related in: MedlinePlus

Examples of SCONE p-Value Scores for Coding (A), Highly Conserved Noncoding (B), and Nonconserved RegionsPositions likely to be conserved (p < 0.05) are in light green; other positions are dark. Below each plot is the portion of the multiple sequence used to generate scores for each sequence region. Deviations from human sequence (green) are indicated in red.(A) A portion of an exon from the MET gene (chr7:115,933,744–115,933,793). The pattern of conserved positions is indicative of the triplet structure of the genetic code.(B) A highly conserved intronic sequence in the FOXP2 gene (chr7:113,646,877–113,646,926).(C) An intergenic region near the AXIN1 gene (chr16:343,046–343,095) showing little overall conservation, but containing a significant number of individually conserved positions.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2230682&req=5

pcbi-0030254-g001: Examples of SCONE p-Value Scores for Coding (A), Highly Conserved Noncoding (B), and Nonconserved RegionsPositions likely to be conserved (p < 0.05) are in light green; other positions are dark. Below each plot is the portion of the multiple sequence used to generate scores for each sequence region. Deviations from human sequence (green) are indicated in red.(A) A portion of an exon from the MET gene (chr7:115,933,744–115,933,793). The pattern of conserved positions is indicative of the triplet structure of the genetic code.(B) A highly conserved intronic sequence in the FOXP2 gene (chr7:113,646,877–113,646,926).(C) An intergenic region near the AXIN1 gene (chr16:343,046–343,095) showing little overall conservation, but containing a significant number of individually conserved positions.

Mentions: SCONE provides an estimate of the rate at which a given position (column) in a multiple sequence alignment is evolving and a probability (p-value) of neutrality for that position, based on a model of neutral evolution. We used SCONE to score conservation in all alignable human bases using the phylogenetic tree and multiple sequence alignments (generated by the TBA alignment program [21]) made available by the ENCODE Multiple Sequence Alignment group [18]. Figure 1 shows an example of SCONE scores. Though positions were human-referenced, we excluded human sequence from conservation analysis to avoid ascertainment biases with regard to the study of human SNPs (see Methods). Positions containing fewer than two aligned sequences were also excluded from scoring. Despite these limitations, SCONE scores are available for 27.6 out of 30 Mbases of ENCODE sequences (92%). We examined the distribution of p-values for SCONE scores in putative neutral sites (see Methods). As p-values for SCONE scores correspond to the hypothesis of neutrality, their distribution in neutral positions should be uniform. On average, the distribution strongly resembles a uniform distribution (Figure S1A), showing that the model of evolution employed by SCONE is in general agreement with the observed pattern of evolution.


Analysis of sequence conservation at nucleotide resolution.

Asthana S, Roytberg M, Stamatoyannopoulos J, Sunyaev S - PLoS Comput. Biol. (2007)

Examples of SCONE p-Value Scores for Coding (A), Highly Conserved Noncoding (B), and Nonconserved RegionsPositions likely to be conserved (p < 0.05) are in light green; other positions are dark. Below each plot is the portion of the multiple sequence used to generate scores for each sequence region. Deviations from human sequence (green) are indicated in red.(A) A portion of an exon from the MET gene (chr7:115,933,744–115,933,793). The pattern of conserved positions is indicative of the triplet structure of the genetic code.(B) A highly conserved intronic sequence in the FOXP2 gene (chr7:113,646,877–113,646,926).(C) An intergenic region near the AXIN1 gene (chr16:343,046–343,095) showing little overall conservation, but containing a significant number of individually conserved positions.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2230682&req=5

pcbi-0030254-g001: Examples of SCONE p-Value Scores for Coding (A), Highly Conserved Noncoding (B), and Nonconserved RegionsPositions likely to be conserved (p < 0.05) are in light green; other positions are dark. Below each plot is the portion of the multiple sequence used to generate scores for each sequence region. Deviations from human sequence (green) are indicated in red.(A) A portion of an exon from the MET gene (chr7:115,933,744–115,933,793). The pattern of conserved positions is indicative of the triplet structure of the genetic code.(B) A highly conserved intronic sequence in the FOXP2 gene (chr7:113,646,877–113,646,926).(C) An intergenic region near the AXIN1 gene (chr16:343,046–343,095) showing little overall conservation, but containing a significant number of individually conserved positions.
Mentions: SCONE provides an estimate of the rate at which a given position (column) in a multiple sequence alignment is evolving and a probability (p-value) of neutrality for that position, based on a model of neutral evolution. We used SCONE to score conservation in all alignable human bases using the phylogenetic tree and multiple sequence alignments (generated by the TBA alignment program [21]) made available by the ENCODE Multiple Sequence Alignment group [18]. Figure 1 shows an example of SCONE scores. Though positions were human-referenced, we excluded human sequence from conservation analysis to avoid ascertainment biases with regard to the study of human SNPs (see Methods). Positions containing fewer than two aligned sequences were also excluded from scoring. Despite these limitations, SCONE scores are available for 27.6 out of 30 Mbases of ENCODE sequences (92%). We examined the distribution of p-values for SCONE scores in putative neutral sites (see Methods). As p-values for SCONE scores correspond to the hypothesis of neutrality, their distribution in neutral positions should be uniform. On average, the distribution strongly resembles a uniform distribution (Figure S1A), showing that the model of evolution employed by SCONE is in general agreement with the observed pattern of evolution.

Bottom Line: Our approach estimates the rate at which each nucleotide position is evolving, computes the probability of neutrality given this rate estimate, and summarizes the result in a Sequence CONservation Evaluation (SCONE) score.These regions are markedly enriched in individually conserved positions and short (<15 bp) conserved "chunks." Our results collectively suggest that the majority of functionally important noncoding conserved positions are highly fragmented and reside outside of canonically defined long conserved noncoding sequences.A small subset of these fragmented positions may be identified with high confidence.

View Article: PubMed Central - PubMed

Affiliation: Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America.

ABSTRACT
One of the major goals of comparative genomics is to understand the evolutionary history of each nucleotide in the human genome sequence, and the degree to which it is under selective pressure. Ascertainment of selective constraint at nucleotide resolution is particularly important for predicting the functional significance of human genetic variation and for analyzing the sequence substructure of cis-regulatory sequences and other functional elements. Current methods for analysis of sequence conservation are focused on delineation of conserved regions comprising tens or even hundreds of consecutive nucleotides. We therefore developed a novel computational approach designed specifically for scoring evolutionary conservation at individual base-pair resolution. Our approach estimates the rate at which each nucleotide position is evolving, computes the probability of neutrality given this rate estimate, and summarizes the result in a Sequence CONservation Evaluation (SCONE) score. We computed SCONE scores in a continuous fashion across 1% of the human genome for which high-quality sequence information from up to 23 genomes are available. We show that SCONE scores are clearly correlated with the allele frequency of human polymorphisms in both coding and noncoding regions. We find that the majority of noncoding conserved nucleotides lie outside of longer conserved elements predicted by other conservation analyses, and are experiencing ongoing selection in modern humans as evident from the allele frequency spectrum of human polymorphism. We also applied SCONE to analyze the distribution of conserved nucleotides within functional regions. These regions are markedly enriched in individually conserved positions and short (<15 bp) conserved "chunks." Our results collectively suggest that the majority of functionally important noncoding conserved positions are highly fragmented and reside outside of canonically defined long conserved noncoding sequences. A small subset of these fragmented positions may be identified with high confidence.

Show MeSH
Related in: MedlinePlus