Regional context in the alignment of biological sequence pairs.
Bottom Line:
We also found that the average concordance, of our alignments with corresponding curated alignments, improves markedly when the model allows either of the two fast rates to colocate with hydrophilic residues.With rRNA molecules, our model did not detect colocation between the fast indel rate and the fast substitution rate.This result suggests that rRNA pairwise alignments should be modeled after allowing for the two processes to vary simultaneously and independently in the two regions.
View Article:
PubMed Central - PubMed
Affiliation: Department of Genome Biology, John Curtin School of Medical Research, Building 54, The Australian National University, Canberra, ACT 0200, Australia. mundu.sammut@gmail.com
ABSTRACT
Show MeSH
Sequence divergence derives from either point substitution or indel (insertion or deletion) processes. We investigated the rates of these two processes both in protein and non-protein coding DNA. We aligned sequence pairs using two pair-hidden Markov models (PHMMs) conjoined by one silent state. The two PHMMs had their own set of parameters to model rates in their respective regions. The aim was to test the hypothesis that the indel mutation rate mimics the point mutation rate. That is, indels are found less often in conserved regions (slow point substitution rate) and more often in non-conserved regions (fast point substitution rate). Both polypeptides and rRNA molecules in our data exhibited a clear distinction between slow and fast rates of the two processes. These two rates served as surrogates to conserved and non-conserved secondary structure components, respectively. With polypeptides we found both the fast indel rate and the fast replacement rate were co-located with hydrophilic residues. We also found that the average concordance, of our alignments with corresponding curated alignments, improves markedly when the model allows either of the two fast rates to colocate with hydrophilic residues. With rRNA molecules, our model did not detect colocation between the fast indel rate and the fast substitution rate. Nevertheless, coupling the indel rates with the point substitution rates across the two regions markedly increased model fit. This result suggests that rRNA pairwise alignments should be modeled after allowing for the two processes to vary simultaneously and independently in the two regions. |
Related In:
Results -
Collection
getmorefigures.php?uid=PMC3064887&req=5
Mentions: A transition from one state to another state within the same PHMMη, η ∈ {1, 2}, of our two-tiered HMM–PHMM topology, takes place with the same probability as that computed by the set of KM equations belonging to that PHMM, except that we multiply this probability by 1 − ρη in our modeling. A transition from one state of PHMMη to another state of PHMM3−η takes place with a probability computed from the two sets of KM equations and the probability ρη. For example, a transition from state M1 to state Y2 would be the product of β1, α2, and ρ1. Figure 4 shows all the possible probability transformations that produce our two-region transition matrix from the conceptual matrices shown in Figs. 2 and 3. The new matrix is implemented in a standard Forward algorithm (Rabiner 1989) after each row has been normalized. Note that the new matrix in Fig. 4 restores the begin state and the end state . That is, the silent state shown in Fig. 3 was only part of the conceptual matrix, and does not need to be implemented explicitly in our modeling following the transformations.Fig. 4 |
View Article: PubMed Central - PubMed
Affiliation: Department of Genome Biology, John Curtin School of Medical Research, Building 54, The Australian National University, Canberra, ACT 0200, Australia. mundu.sammut@gmail.com