Limits...
CSMET: comparative genomic motif detection via multi-resolution phylogenetic shadowing.

Ray P, Shringarpure S, Kolar M, Xing EP - PLoS Comput. Biol. (2008)

Bottom Line: As a result, comparative genomic search of non-conserved motifs across evolutionarily related taxa remains a difficult challenge, especially in higher eukaryotes, where the cis-regulatory regions containing motifs can be long and divergent; existing methods rely heavily on specialized pattern-driven heuristic search or sampling algorithms, which can be difficult to generalize and hard to interpret based on phylogenetic principles.Combining this method with a hidden Markov model that autocorrelates evolutionary rates on successive sites in the genome, CSMET offers a principled way to take into consideration lineage-specific evolution of TFBSs during motif detection, and a readily computable analytical form of the posterior distribution of motifs under TFBS turnover.On both simulated and real Drosophila cis-regulatory modules, CSMET outperforms other state-of-the-art comparative genomic motif finders.

View Article: PubMed Central - PubMed

Affiliation: School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America.

ABSTRACT
Functional turnover of transcription factor binding sites (TFBSs), such as whole-motif loss or gain, are common events during genome evolution. Conventional probabilistic phylogenetic shadowing methods model the evolution of genomes only at nucleotide level, and lack the ability to capture the evolutionary dynamics of functional turnover of aligned sequence entities. As a result, comparative genomic search of non-conserved motifs across evolutionarily related taxa remains a difficult challenge, especially in higher eukaryotes, where the cis-regulatory regions containing motifs can be long and divergent; existing methods rely heavily on specialized pattern-driven heuristic search or sampling algorithms, which can be difficult to generalize and hard to interpret based on phylogenetic principles. We propose a new method: Conditional Shadowing via Multi-resolution Evolutionary Trees, or CSMET, which uses a context-dependent probabilistic graphical model that allows aligned sites from different taxa in a multiple alignment to be modeled by either a background or an appropriate motif phylogeny conditioning on the functional specifications of each taxon. The functional specifications themselves are the output of a phylogeny which models the evolution not of individual nucleotides, but of the overall functionality (e.g., functional retention or loss) of the aligned sequence segments over lineages. Combining this method with a hidden Markov model that autocorrelates evolutionary rates on successive sites in the genome, CSMET offers a principled way to take into consideration lineage-specific evolution of TFBSs during motif detection, and a readily computable analytical form of the posterior distribution of motifs under TFBS turnover. On both simulated and real Drosophila cis-regulatory modules, CSMET outperforms other state-of-the-art comparative genomic motif finders.

Show MeSH
A demonstration of motif turnover.(A) Two examples of multiple alignments of Drosophila CRMs, showing functional turnover in known TFBSs. The first one (top) shows an instance of binding site loss in D. ananassae, the motif in question being Caudal, in the Hairy 6 CRM. The second one (bottom) shows more instances of TFBS loss/gain. This example depicts a turnover with only melanogaster, simulans, and sechellia retaining the binding site functionality. (B) Putative TFBSs in eve2 enhancer across 4 taxa: D. melanogaster, D. yakuba, D. erecta and D. pseudoobscura. (Extracted and modified from Figure 4 in [11].) Notice that orthologs of melanogaster motifs bcd-3 and hb can not be identified from some of the other taxa.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2396503&req=5

pcbi-1000090-g001: A demonstration of motif turnover.(A) Two examples of multiple alignments of Drosophila CRMs, showing functional turnover in known TFBSs. The first one (top) shows an instance of binding site loss in D. ananassae, the motif in question being Caudal, in the Hairy 6 CRM. The second one (bottom) shows more instances of TFBS loss/gain. This example depicts a turnover with only melanogaster, simulans, and sechellia retaining the binding site functionality. (B) Putative TFBSs in eve2 enhancer across 4 taxa: D. melanogaster, D. yakuba, D. erecta and D. pseudoobscura. (Extracted and modified from Figure 4 in [11].) Notice that orthologs of melanogaster motifs bcd-3 and hb can not be identified from some of the other taxa.

Mentions: Unlike genes, where functional turnover usually occurs only in distant species and the complete orthology assumption is largely satisfied when sequences are aligned across phylogenetically closely related species, short and degenerate sequence patterns such as transcription factor (TF) binding sites (i.e., motifs) exhibit frequent turnover even across closely related taxa, such as various fruit fly species [8] (Figure 1). As we will discuss shortly, the functional heterogeneity of aligned regions across different taxa due to motif turnover often renders the conventional phylogenetic shadowing models inappropriate for comparative genomic motif finding. Some recent methods combine scoring functions modified from classic molecular evolution models with more flexible heuristic partial alignment search, and exhibit better sensitivity to non-conserved motifs [9],[10], but they offer little insight into the evolutionary dynamics of motif turnover and can have substantial computational complexity. In this paper, we present a principled approach that addresses the “incomplete orthology” issue arising from either functional gain/loss such as motif turnover or imperfect sequence alignment. We propose a new algorithm for searching binding sites of given TFs in multiple genomes based on a novel multi-resolution evolutionary model named CSMET. CSMET stands for Conditional Shadowing via Multi-resolution Evolutionary Trees. It explicitly models motif turnover across species through a “low resolution” phylogeny defined by a functional substitution process. Conditioning on the motif turnover states, which specify the presence or absence of TFBS functionality in each taxon, at any given location, specific “high resolution” phylogenies defined by function-specific nucleotide substitution processes are applied to different subsets (corresponding to taxa with different turnover status) of the aligned sequences at the attendant location. The model thereby captures function-specific sequence evolution in every taxon rather than subjecting all taxa to the same phylogeny as in the conventional model (Figure 2).


CSMET: comparative genomic motif detection via multi-resolution phylogenetic shadowing.

Ray P, Shringarpure S, Kolar M, Xing EP - PLoS Comput. Biol. (2008)

A demonstration of motif turnover.(A) Two examples of multiple alignments of Drosophila CRMs, showing functional turnover in known TFBSs. The first one (top) shows an instance of binding site loss in D. ananassae, the motif in question being Caudal, in the Hairy 6 CRM. The second one (bottom) shows more instances of TFBS loss/gain. This example depicts a turnover with only melanogaster, simulans, and sechellia retaining the binding site functionality. (B) Putative TFBSs in eve2 enhancer across 4 taxa: D. melanogaster, D. yakuba, D. erecta and D. pseudoobscura. (Extracted and modified from Figure 4 in [11].) Notice that orthologs of melanogaster motifs bcd-3 and hb can not be identified from some of the other taxa.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2396503&req=5

pcbi-1000090-g001: A demonstration of motif turnover.(A) Two examples of multiple alignments of Drosophila CRMs, showing functional turnover in known TFBSs. The first one (top) shows an instance of binding site loss in D. ananassae, the motif in question being Caudal, in the Hairy 6 CRM. The second one (bottom) shows more instances of TFBS loss/gain. This example depicts a turnover with only melanogaster, simulans, and sechellia retaining the binding site functionality. (B) Putative TFBSs in eve2 enhancer across 4 taxa: D. melanogaster, D. yakuba, D. erecta and D. pseudoobscura. (Extracted and modified from Figure 4 in [11].) Notice that orthologs of melanogaster motifs bcd-3 and hb can not be identified from some of the other taxa.
Mentions: Unlike genes, where functional turnover usually occurs only in distant species and the complete orthology assumption is largely satisfied when sequences are aligned across phylogenetically closely related species, short and degenerate sequence patterns such as transcription factor (TF) binding sites (i.e., motifs) exhibit frequent turnover even across closely related taxa, such as various fruit fly species [8] (Figure 1). As we will discuss shortly, the functional heterogeneity of aligned regions across different taxa due to motif turnover often renders the conventional phylogenetic shadowing models inappropriate for comparative genomic motif finding. Some recent methods combine scoring functions modified from classic molecular evolution models with more flexible heuristic partial alignment search, and exhibit better sensitivity to non-conserved motifs [9],[10], but they offer little insight into the evolutionary dynamics of motif turnover and can have substantial computational complexity. In this paper, we present a principled approach that addresses the “incomplete orthology” issue arising from either functional gain/loss such as motif turnover or imperfect sequence alignment. We propose a new algorithm for searching binding sites of given TFs in multiple genomes based on a novel multi-resolution evolutionary model named CSMET. CSMET stands for Conditional Shadowing via Multi-resolution Evolutionary Trees. It explicitly models motif turnover across species through a “low resolution” phylogeny defined by a functional substitution process. Conditioning on the motif turnover states, which specify the presence or absence of TFBS functionality in each taxon, at any given location, specific “high resolution” phylogenies defined by function-specific nucleotide substitution processes are applied to different subsets (corresponding to taxa with different turnover status) of the aligned sequences at the attendant location. The model thereby captures function-specific sequence evolution in every taxon rather than subjecting all taxa to the same phylogeny as in the conventional model (Figure 2).

Bottom Line: As a result, comparative genomic search of non-conserved motifs across evolutionarily related taxa remains a difficult challenge, especially in higher eukaryotes, where the cis-regulatory regions containing motifs can be long and divergent; existing methods rely heavily on specialized pattern-driven heuristic search or sampling algorithms, which can be difficult to generalize and hard to interpret based on phylogenetic principles.Combining this method with a hidden Markov model that autocorrelates evolutionary rates on successive sites in the genome, CSMET offers a principled way to take into consideration lineage-specific evolution of TFBSs during motif detection, and a readily computable analytical form of the posterior distribution of motifs under TFBS turnover.On both simulated and real Drosophila cis-regulatory modules, CSMET outperforms other state-of-the-art comparative genomic motif finders.

View Article: PubMed Central - PubMed

Affiliation: School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America.

ABSTRACT
Functional turnover of transcription factor binding sites (TFBSs), such as whole-motif loss or gain, are common events during genome evolution. Conventional probabilistic phylogenetic shadowing methods model the evolution of genomes only at nucleotide level, and lack the ability to capture the evolutionary dynamics of functional turnover of aligned sequence entities. As a result, comparative genomic search of non-conserved motifs across evolutionarily related taxa remains a difficult challenge, especially in higher eukaryotes, where the cis-regulatory regions containing motifs can be long and divergent; existing methods rely heavily on specialized pattern-driven heuristic search or sampling algorithms, which can be difficult to generalize and hard to interpret based on phylogenetic principles. We propose a new method: Conditional Shadowing via Multi-resolution Evolutionary Trees, or CSMET, which uses a context-dependent probabilistic graphical model that allows aligned sites from different taxa in a multiple alignment to be modeled by either a background or an appropriate motif phylogeny conditioning on the functional specifications of each taxon. The functional specifications themselves are the output of a phylogeny which models the evolution not of individual nucleotides, but of the overall functionality (e.g., functional retention or loss) of the aligned sequence segments over lineages. Combining this method with a hidden Markov model that autocorrelates evolutionary rates on successive sites in the genome, CSMET offers a principled way to take into consideration lineage-specific evolution of TFBSs during motif detection, and a readily computable analytical form of the posterior distribution of motifs under TFBS turnover. On both simulated and real Drosophila cis-regulatory modules, CSMET outperforms other state-of-the-art comparative genomic motif finders.

Show MeSH