Improving multiple sequence alignment by using better guide trees.
Bottom Line: It is believed that guide trees are very important to progressive alignment; a better guide tree will give an alignment with higher accuracy.Our study showed that our adaptive method can be used to improve the accuracy of many different progressive MSA tools.In fact, we give evidences showing that the guide trees constructed by the adaptive method are among the best.
Progressive sequence alignment is one of the most commonly used method for multiple sequence alignment. Roughly speaking, the method first builds a guide tree, and then aligns the sequences progressively according to the topology of the tree. It is believed that guide trees are very important to progressive alignment; a better guide tree will give an alignment with higher accuracy. Recently, we have proposed an adaptive method for constructing guide trees. This paper studies the quality of the guide trees constructed by such method. Our study showed that our adaptive method can be used to improve the accuracy of many different progressive MSA tools. In fact, we give evidences showing that the guide trees constructed by the adaptive method are among the best.
License 1 - License 2
Mentions: To confirm this intuition, we have modified GLProbs to GLProbs-Random, which replaces the guide tree constructed in GLProbs by a random guide tree. We have used them to align protein sequences families obtained from the benchmark database OXBench. Figure 1 shows their alignments' sum-of-pairs (SP) scores and total column (TC) scores, two of the most commonly used scores for measuring the quality of MSA. Each dot (x, y) in the figure shows the scores obtained by GLProbs and GLProbs-Random for one testing sample, where x is the score obtained by GLProbs and y by GLProbs-Random. Unsurprisingly, we note that most points are below the diagonals, which means GLProbs outperformed GLProbs-Random. This confirms the importance of guide trees. However, it is interesting to observe that there are also many points above the diagonals, which means that for these inputs, random guide trees are better than the guide trees elaborately generated by GLProbs. After a careful study of the inputs, we found that most of these inputs have low similarities. We believe that to generate better alignments for these inputs, we should abandon the progressive method, and try other methods such as the non-progressive alignment method , that do not rely on guide trees to generate their alignments.