Limits...
Regional context in the alignment of biological sequence pairs.

Sammut R, Huttley G - J. Mol. Evol. (2010)

Bottom Line: We also found that the average concordance, of our alignments with corresponding curated alignments, improves markedly when the model allows either of the two fast rates to colocate with hydrophilic residues.With rRNA molecules, our model did not detect colocation between the fast indel rate and the fast substitution rate.This result suggests that rRNA pairwise alignments should be modeled after allowing for the two processes to vary simultaneously and independently in the two regions.

View Article: PubMed Central - PubMed

Affiliation: Department of Genome Biology, John Curtin School of Medical Research, Building 54, The Australian National University, Canberra, ACT 0200, Australia. mundu.sammut@gmail.com

ABSTRACT
Sequence divergence derives from either point substitution or indel (insertion or deletion) processes. We investigated the rates of these two processes both in protein and non-protein coding DNA. We aligned sequence pairs using two pair-hidden Markov models (PHMMs) conjoined by one silent state. The two PHMMs had their own set of parameters to model rates in their respective regions. The aim was to test the hypothesis that the indel mutation rate mimics the point mutation rate. That is, indels are found less often in conserved regions (slow point substitution rate) and more often in non-conserved regions (fast point substitution rate). Both polypeptides and rRNA molecules in our data exhibited a clear distinction between slow and fast rates of the two processes. These two rates served as surrogates to conserved and non-conserved secondary structure components, respectively. With polypeptides we found both the fast indel rate and the fast replacement rate were co-located with hydrophilic residues. We also found that the average concordance, of our alignments with corresponding curated alignments, improves markedly when the model allows either of the two fast rates to colocate with hydrophilic residues. With rRNA molecules, our model did not detect colocation between the fast indel rate and the fast substitution rate. Nevertheless, coupling the indel rates with the point substitution rates across the two regions markedly increased model fit. This result suggests that rRNA pairwise alignments should be modeled after allowing for the two processes to vary simultaneously and independently in the two regions.

Show MeSH
Conceptual two-region transition matrix T of HMM–PHMM topology constructed from two 3 × 3 transition matrices of the two PHMMs. Silent state  acts as begin state of source PHMM through first row and as end state of sink PHMM through last column, simultaneously
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3064887&req=5

Fig3: Conceptual two-region transition matrix T of HMM–PHMM topology constructed from two 3 × 3 transition matrices of the two PHMMs. Silent state acts as begin state of source PHMM through first row and as end state of sink PHMM through last column, simultaneously

Mentions: Figure 3 shows the conceptual matrix of transition probabilities of the lower layer of the two-tiered HMM–PHMM topology. The upper layer captures the alternating behavior of rate heterogeneity. We assume this alternating behavior to be a two-state Markov process. This process has a 2 × 2 transition matrix shown conceptually in Fig. 2 with transition probabilities ρ1, 0 < ρ1 < 1, and ρ2, 0 < ρ2 < 1. These switching probabilities determine the flow intensity in the current PHMM before they switch flow to the other PHMM of the two-region topology via the silent state in Fig. 1. Churchill (1989) showed that under stable DNA compositional heterogeneity, a switching probability would typically be small. By extension to the substitution rate problem, a low switching probability means that we expect protein (or DNA) sections, alternately experiencing low and high rates of replacements (substitutions) along the pairwise alignment, not to be fragmented.Fig. 3


Regional context in the alignment of biological sequence pairs.

Sammut R, Huttley G - J. Mol. Evol. (2010)

Conceptual two-region transition matrix T of HMM–PHMM topology constructed from two 3 × 3 transition matrices of the two PHMMs. Silent state  acts as begin state of source PHMM through first row and as end state of sink PHMM through last column, simultaneously
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3064887&req=5

Fig3: Conceptual two-region transition matrix T of HMM–PHMM topology constructed from two 3 × 3 transition matrices of the two PHMMs. Silent state acts as begin state of source PHMM through first row and as end state of sink PHMM through last column, simultaneously
Mentions: Figure 3 shows the conceptual matrix of transition probabilities of the lower layer of the two-tiered HMM–PHMM topology. The upper layer captures the alternating behavior of rate heterogeneity. We assume this alternating behavior to be a two-state Markov process. This process has a 2 × 2 transition matrix shown conceptually in Fig. 2 with transition probabilities ρ1, 0 < ρ1 < 1, and ρ2, 0 < ρ2 < 1. These switching probabilities determine the flow intensity in the current PHMM before they switch flow to the other PHMM of the two-region topology via the silent state in Fig. 1. Churchill (1989) showed that under stable DNA compositional heterogeneity, a switching probability would typically be small. By extension to the substitution rate problem, a low switching probability means that we expect protein (or DNA) sections, alternately experiencing low and high rates of replacements (substitutions) along the pairwise alignment, not to be fragmented.Fig. 3

Bottom Line: We also found that the average concordance, of our alignments with corresponding curated alignments, improves markedly when the model allows either of the two fast rates to colocate with hydrophilic residues.With rRNA molecules, our model did not detect colocation between the fast indel rate and the fast substitution rate.This result suggests that rRNA pairwise alignments should be modeled after allowing for the two processes to vary simultaneously and independently in the two regions.

View Article: PubMed Central - PubMed

Affiliation: Department of Genome Biology, John Curtin School of Medical Research, Building 54, The Australian National University, Canberra, ACT 0200, Australia. mundu.sammut@gmail.com

ABSTRACT
Sequence divergence derives from either point substitution or indel (insertion or deletion) processes. We investigated the rates of these two processes both in protein and non-protein coding DNA. We aligned sequence pairs using two pair-hidden Markov models (PHMMs) conjoined by one silent state. The two PHMMs had their own set of parameters to model rates in their respective regions. The aim was to test the hypothesis that the indel mutation rate mimics the point mutation rate. That is, indels are found less often in conserved regions (slow point substitution rate) and more often in non-conserved regions (fast point substitution rate). Both polypeptides and rRNA molecules in our data exhibited a clear distinction between slow and fast rates of the two processes. These two rates served as surrogates to conserved and non-conserved secondary structure components, respectively. With polypeptides we found both the fast indel rate and the fast replacement rate were co-located with hydrophilic residues. We also found that the average concordance, of our alignments with corresponding curated alignments, improves markedly when the model allows either of the two fast rates to colocate with hydrophilic residues. With rRNA molecules, our model did not detect colocation between the fast indel rate and the fast substitution rate. Nevertheless, coupling the indel rates with the point substitution rates across the two regions markedly increased model fit. This result suggests that rRNA pairwise alignments should be modeled after allowing for the two processes to vary simultaneously and independently in the two regions.

Show MeSH