Limits...
PicXAA: greedy probabilistic construction of maximum expected accuracy alignment of multiple sequences.

Sahraeian SM, Yoon BJ - Nucleic Acids Res. (2010)

Bottom Line: Accurate tools for multiple sequence alignment (MSA) are essential for comparative studies of the function and structure of biological sequences.PicXAA greedily builds up the multiple alignment from sequence regions with high local similarities, thereby yielding an accurate global alignment that effectively grasps the local similarities among sequences.Evaluations on several widely used benchmark sets show that PicXAA constantly yields accurate alignment results on a wide range of reference sets, with especially remarkable improvements over other leading algorithms on sequence sets with local similarities.

View Article: PubMed Central - PubMed

Affiliation: Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.

ABSTRACT
Accurate tools for multiple sequence alignment (MSA) are essential for comparative studies of the function and structure of biological sequences. However, it is very challenging to develop a computationally efficient algorithm that can consistently predict accurate alignments for various types of sequence sets. In this article, we introduce PicXAA (Probabilistic Maximum Accuracy Alignment), a probabilistic non-progressive alignment algorithm that aims to find protein alignments with maximum expected accuracy. PicXAA greedily builds up the multiple alignment from sequence regions with high local similarities, thereby yielding an accurate global alignment that effectively grasps the local similarities among sequences. Evaluations on several widely used benchmark sets show that PicXAA constantly yields accurate alignment results on a wide range of reference sets, with especially remarkable improvements over other leading algorithms on sequence sets with local similarities. PicXAA source code is freely available at: http://www.ece.tamu.edu/~bjyoon/picxaa/.

Show MeSH

Related in: MedlinePlus

Effectiveness of the proposed techniques. (A and B) Novel consistency transformation; (C and D) Discriminative refinement strategy; (E and F) Greedy graph-based alignment.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2926610&req=5

Figure 3: Effectiveness of the proposed techniques. (A and B) Novel consistency transformation; (C and D) Discriminative refinement strategy; (E and F) Greedy graph-based alignment.

Mentions: To assess the effectiveness of the proposed transformation, we conducted the following experiment. In this test, we examined the performance improvement that can be achieved by incorporating the new consistency transformation in two popular probabilistic consistency-based alignment algorithms, ProbAlign (27) and ProbCons (20). Figure 3A and B summarize the average SP and CS scores on IRMBASE 2.0 and BAliBASE 3.0 benchmarks. As we can observe in these figures, the new consistency transformation significantly improves the SP and CS scores of both algorithms on the IRMBASE 2.0 dataset. The improvement is near 1.5% in the SP score and 3.3% in the CS score for ProbAlign, and 1.6% in the SP score and 4.2% in the CS score for ProbCons. This clearly shows the advantage of the proposed probabilistic consistency transformation over the conventional transformation on reference sets with local similarities. For BAliBASE 3.0, the proposed transformation does not result in a significant change in average the SP score, but it leads to about 0.3% improvement in the average CS score. The improvement is especially significant for the reference set RV30, which consists of sequences that belong to diverse subfamilies, where we have 1.6% (0.1%) improvement in the SP score and 3.6% (1.7%) improvement in the CS score for ProbAlign (ProbCons).Figure 3.


PicXAA: greedy probabilistic construction of maximum expected accuracy alignment of multiple sequences.

Sahraeian SM, Yoon BJ - Nucleic Acids Res. (2010)

Effectiveness of the proposed techniques. (A and B) Novel consistency transformation; (C and D) Discriminative refinement strategy; (E and F) Greedy graph-based alignment.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2926610&req=5

Figure 3: Effectiveness of the proposed techniques. (A and B) Novel consistency transformation; (C and D) Discriminative refinement strategy; (E and F) Greedy graph-based alignment.
Mentions: To assess the effectiveness of the proposed transformation, we conducted the following experiment. In this test, we examined the performance improvement that can be achieved by incorporating the new consistency transformation in two popular probabilistic consistency-based alignment algorithms, ProbAlign (27) and ProbCons (20). Figure 3A and B summarize the average SP and CS scores on IRMBASE 2.0 and BAliBASE 3.0 benchmarks. As we can observe in these figures, the new consistency transformation significantly improves the SP and CS scores of both algorithms on the IRMBASE 2.0 dataset. The improvement is near 1.5% in the SP score and 3.3% in the CS score for ProbAlign, and 1.6% in the SP score and 4.2% in the CS score for ProbCons. This clearly shows the advantage of the proposed probabilistic consistency transformation over the conventional transformation on reference sets with local similarities. For BAliBASE 3.0, the proposed transformation does not result in a significant change in average the SP score, but it leads to about 0.3% improvement in the average CS score. The improvement is especially significant for the reference set RV30, which consists of sequences that belong to diverse subfamilies, where we have 1.6% (0.1%) improvement in the SP score and 3.6% (1.7%) improvement in the CS score for ProbAlign (ProbCons).Figure 3.

Bottom Line: Accurate tools for multiple sequence alignment (MSA) are essential for comparative studies of the function and structure of biological sequences.PicXAA greedily builds up the multiple alignment from sequence regions with high local similarities, thereby yielding an accurate global alignment that effectively grasps the local similarities among sequences.Evaluations on several widely used benchmark sets show that PicXAA constantly yields accurate alignment results on a wide range of reference sets, with especially remarkable improvements over other leading algorithms on sequence sets with local similarities.

View Article: PubMed Central - PubMed

Affiliation: Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.

ABSTRACT
Accurate tools for multiple sequence alignment (MSA) are essential for comparative studies of the function and structure of biological sequences. However, it is very challenging to develop a computationally efficient algorithm that can consistently predict accurate alignments for various types of sequence sets. In this article, we introduce PicXAA (Probabilistic Maximum Accuracy Alignment), a probabilistic non-progressive alignment algorithm that aims to find protein alignments with maximum expected accuracy. PicXAA greedily builds up the multiple alignment from sequence regions with high local similarities, thereby yielding an accurate global alignment that effectively grasps the local similarities among sequences. Evaluations on several widely used benchmark sets show that PicXAA constantly yields accurate alignment results on a wide range of reference sets, with especially remarkable improvements over other leading algorithms on sequence sets with local similarities. PicXAA source code is freely available at: http://www.ece.tamu.edu/~bjyoon/picxaa/.

Show MeSH
Related in: MedlinePlus