Limits...
Detection of distant evolutionary relationships between protein families using theory of sequence profile-profile comparison.

Margelevicius M, Venclovas C - BMC Bioinformatics (2010)

Bottom Line: For benchmarking, we use a reference ("gold standard") free model-based evaluation framework.Evaluation results show that at the level of protein domains our method compares favorably to all other tested methods.We also provide examples of the new method outperforming structure-based similarity detection and alignment.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Biotechnology, Graiciƫno 8, LT-02241 Vilnius, Lithuania.

ABSTRACT

Background: Detection of common evolutionary origin (homology) is a primary means of inferring protein structure and function. At present, comparison of protein families represented as sequence profiles is arguably the most effective homology detection strategy. However, finding the best way to represent evolutionary information of a protein sequence family in the profile, to compare profiles and to estimate the biological significance of such comparisons, remains an active area of research.

Results: Here, we present a new homology detection method based on sequence profile-profile comparison. The method has a number of new features including position-dependent gap penalties and a global score system. Position-dependent gap penalties provide a more biologically relevant way to represent and align protein families as sequence profiles. The global score system enables an analytical solution of the statistical parameters needed to estimate the statistical significance of profile-profile similarities. The new method, together with other state-of-the-art profile-based methods (HHsearch, COMPASS and PSI-BLAST), is benchmarked in all-against-all comparison of a challenging set of SCOP domains that share at most 20% sequence identity. For benchmarking, we use a reference ("gold standard") free model-based evaluation framework. Evaluation results show that at the level of protein domains our method compares favorably to all other tested methods. We also provide examples of the new method outperforming structure-based similarity detection and alignment. The implementation of the new method both as a standalone software package and as a web server is available at http://www.ibt.lt/bioinformatics/coma.

Conclusion: Due to a number of developments, the new profile-profile comparison method shows an improved ability to match distantly related protein domains. Therefore, the method should be useful for annotation and homology modeling of uncharacterized proteins.

Show MeSH

Related in: MedlinePlus

Distribution of TM-scores for original and structure-based alignments. TM-score histograms obtained for original alignments in global (A) and local (B) evaluation modes without division into TP/FP. The data is shown for the 14516 most significant hits for each method (the number corresponds to COMA's hits up to Evalue = 0.01). Histograms in (C) and (D) show TM-score distributions for the same hits as in (A) and (B) respectively, but with TM-scores derived using DALI structural alignments.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2837030&req=5

Figure 5: Distribution of TM-scores for original and structure-based alignments. TM-score histograms obtained for original alignments in global (A) and local (B) evaluation modes without division into TP/FP. The data is shown for the 14516 most significant hits for each method (the number corresponds to COMA's hits up to Evalue = 0.01). Histograms in (C) and (D) show TM-score distributions for the same hits as in (A) and (B) respectively, but with TM-scores derived using DALI structural alignments.

Mentions: It is a common knowledge that structure-based alignments are in general more accurate than those based on sequence (profile, HMM) comparison. Thus, we asked whether and, if so, to what extent alignments produced by individual methods in all-to-all comparison can be improved by using structure comparison? To answer this question, we realigned equal number of the top matching domain pairs for each method with DALI [26] and computed TM-scores for these alignments. Distribution of the original TM-scores and those based on DALI alignments are shown in Figure 5. One can see that the TM-score distribution derived using DALI structural alignments is strongly shifted towards higher TM-score values in both evaluation modes. In other words, DALI was able to improve significantly both the coverage and accuracy of alignments. Interestingly, after the realignment with DALI, the TM-score distribution is similar for all evaluated methods, except for PSI-BLAST, which remains significantly worse. These results indicate that the quality of homology detection for all of the evaluated methods (except for PSI-BLAST) is comparable. The observed differences in performance mostly come from differences in the alignment coverage and/or accuracy.


Detection of distant evolutionary relationships between protein families using theory of sequence profile-profile comparison.

Margelevicius M, Venclovas C - BMC Bioinformatics (2010)

Distribution of TM-scores for original and structure-based alignments. TM-score histograms obtained for original alignments in global (A) and local (B) evaluation modes without division into TP/FP. The data is shown for the 14516 most significant hits for each method (the number corresponds to COMA's hits up to Evalue = 0.01). Histograms in (C) and (D) show TM-score distributions for the same hits as in (A) and (B) respectively, but with TM-scores derived using DALI structural alignments.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2837030&req=5

Figure 5: Distribution of TM-scores for original and structure-based alignments. TM-score histograms obtained for original alignments in global (A) and local (B) evaluation modes without division into TP/FP. The data is shown for the 14516 most significant hits for each method (the number corresponds to COMA's hits up to Evalue = 0.01). Histograms in (C) and (D) show TM-score distributions for the same hits as in (A) and (B) respectively, but with TM-scores derived using DALI structural alignments.
Mentions: It is a common knowledge that structure-based alignments are in general more accurate than those based on sequence (profile, HMM) comparison. Thus, we asked whether and, if so, to what extent alignments produced by individual methods in all-to-all comparison can be improved by using structure comparison? To answer this question, we realigned equal number of the top matching domain pairs for each method with DALI [26] and computed TM-scores for these alignments. Distribution of the original TM-scores and those based on DALI alignments are shown in Figure 5. One can see that the TM-score distribution derived using DALI structural alignments is strongly shifted towards higher TM-score values in both evaluation modes. In other words, DALI was able to improve significantly both the coverage and accuracy of alignments. Interestingly, after the realignment with DALI, the TM-score distribution is similar for all evaluated methods, except for PSI-BLAST, which remains significantly worse. These results indicate that the quality of homology detection for all of the evaluated methods (except for PSI-BLAST) is comparable. The observed differences in performance mostly come from differences in the alignment coverage and/or accuracy.

Bottom Line: For benchmarking, we use a reference ("gold standard") free model-based evaluation framework.Evaluation results show that at the level of protein domains our method compares favorably to all other tested methods.We also provide examples of the new method outperforming structure-based similarity detection and alignment.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Biotechnology, Graiciƫno 8, LT-02241 Vilnius, Lithuania.

ABSTRACT

Background: Detection of common evolutionary origin (homology) is a primary means of inferring protein structure and function. At present, comparison of protein families represented as sequence profiles is arguably the most effective homology detection strategy. However, finding the best way to represent evolutionary information of a protein sequence family in the profile, to compare profiles and to estimate the biological significance of such comparisons, remains an active area of research.

Results: Here, we present a new homology detection method based on sequence profile-profile comparison. The method has a number of new features including position-dependent gap penalties and a global score system. Position-dependent gap penalties provide a more biologically relevant way to represent and align protein families as sequence profiles. The global score system enables an analytical solution of the statistical parameters needed to estimate the statistical significance of profile-profile similarities. The new method, together with other state-of-the-art profile-based methods (HHsearch, COMPASS and PSI-BLAST), is benchmarked in all-against-all comparison of a challenging set of SCOP domains that share at most 20% sequence identity. For benchmarking, we use a reference ("gold standard") free model-based evaluation framework. Evaluation results show that at the level of protein domains our method compares favorably to all other tested methods. We also provide examples of the new method outperforming structure-based similarity detection and alignment. The implementation of the new method both as a standalone software package and as a web server is available at http://www.ibt.lt/bioinformatics/coma.

Conclusion: Due to a number of developments, the new profile-profile comparison method shows an improved ability to match distantly related protein domains. Therefore, the method should be useful for annotation and homology modeling of uncharacterized proteins.

Show MeSH
Related in: MedlinePlus