Limits...
Coevolutionary analyses require phylogenetically deep alignments and better models to accurately detect inter-protein contacts within and between species.

Avila-Herrera A, Pollard KS - BMC Bioinformatics (2015)

Bottom Line: When biomolecules physically interact, natural selection operates on them jointly.Two commonly used distributions are anti-conservative and have high false positive rates in some scenarios, although the empirical distribution of scores performs reasonably well with deep alignments.We conclude that coevolutionary analysis of cross-species protein interactions holds great promise but requires sequencing many more species pairs.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Graduate Program, University of California, San Francisco, USA. aram.avilaherrera@ucsf.edu.

ABSTRACT

Background: When biomolecules physically interact, natural selection operates on them jointly. Contacting positions in protein and RNA structures exhibit correlated patterns of sequence evolution due to constraints imposed by the interaction, and molecular arms races can develop between interacting proteins in pathogens and their hosts. To evaluate how well methods developed to detect coevolving residues within proteins can be adapted for cross-species, inter-protein analysis, we used statistical criteria to quantify the performance of these methods in detecting inter-protein residues within 8 angstroms of each other in the co-crystal structures of 33 bacterial protein interactions. We also evaluated their performance for detecting known residues at the interface of a host-virus protein complex with a partially solved structure.

Results: Our quantitative benchmarking showed that all coevolutionary methods clearly benefit from alignments with many sequences. Methods that aim to detect direct correlations generally outperform other approaches. However, faster mutual information based methods are occasionally competitive in small alignments and with relaxed false positive rates. Two commonly used distributions are anti-conservative and have high false positive rates in some scenarios, although the empirical distribution of scores performs reasonably well with deep alignments.

Conclusions: We conclude that coevolutionary analysis of cross-species protein interactions holds great promise but requires sequencing many more species pairs.

No MeSH data available.


Related in: MedlinePlus

Coevolution statistics differ in their ability to detect residue contacts in HisKA-RR sub-alignments. Direct methods benefit from larger, more diverse alignments. Left: Precision (PPV) at false positive rate (FPR) < 0.1 %. Right: Power (TPR) at false positive rate (FPR) < 5 %. Blue lines indicate a loess fit to each method, 95 % confidence intervals are shown in gray. See Abbreviations and Table 1 for abbreviations
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4549020&req=5

Fig1: Coevolution statistics differ in their ability to detect residue contacts in HisKA-RR sub-alignments. Direct methods benefit from larger, more diverse alignments. Left: Precision (PPV) at false positive rate (FPR) < 0.1 %. Right: Power (TPR) at false positive rate (FPR) < 5 %. Blue lines indicate a loess fit to each method, 95 % confidence intervals are shown in gray. See Abbreviations and Table 1 for abbreviations

Mentions: Both power and precision improve with increasing Neff/L for nearly all coevolutionary methods in the HisKA-RR data set (Fig. 1). However, for alignments with Neff/L < 1.0, power at FPR < 5 % remains relatively low (< 50 %), and even lower (< 10 %) when controlling the false positive rate more strictly (FPR < 0.1 %). Precision is expectedly higher at FPR < 0.1 % than at FPR < 5 %, but also remains below 50 % for “square” (Neff/L = 1.0) alignments. Additionally, the performance metrics fmax and ϕmax show that there are no score thresholds (i.e. the strictness of predictions) that achieve both high precision and power in alignments with Neff/L ≲ 3.0 (Additional file 13: Figure S15-S17). Despite the smaller range in Neff/L values, these performance trends are also observed across the Ovch32 alignments (Additional file 13: Figure S11 and S19).Fig. 1


Coevolutionary analyses require phylogenetically deep alignments and better models to accurately detect inter-protein contacts within and between species.

Avila-Herrera A, Pollard KS - BMC Bioinformatics (2015)

Coevolution statistics differ in their ability to detect residue contacts in HisKA-RR sub-alignments. Direct methods benefit from larger, more diverse alignments. Left: Precision (PPV) at false positive rate (FPR) < 0.1 %. Right: Power (TPR) at false positive rate (FPR) < 5 %. Blue lines indicate a loess fit to each method, 95 % confidence intervals are shown in gray. See Abbreviations and Table 1 for abbreviations
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4549020&req=5

Fig1: Coevolution statistics differ in their ability to detect residue contacts in HisKA-RR sub-alignments. Direct methods benefit from larger, more diverse alignments. Left: Precision (PPV) at false positive rate (FPR) < 0.1 %. Right: Power (TPR) at false positive rate (FPR) < 5 %. Blue lines indicate a loess fit to each method, 95 % confidence intervals are shown in gray. See Abbreviations and Table 1 for abbreviations
Mentions: Both power and precision improve with increasing Neff/L for nearly all coevolutionary methods in the HisKA-RR data set (Fig. 1). However, for alignments with Neff/L < 1.0, power at FPR < 5 % remains relatively low (< 50 %), and even lower (< 10 %) when controlling the false positive rate more strictly (FPR < 0.1 %). Precision is expectedly higher at FPR < 0.1 % than at FPR < 5 %, but also remains below 50 % for “square” (Neff/L = 1.0) alignments. Additionally, the performance metrics fmax and ϕmax show that there are no score thresholds (i.e. the strictness of predictions) that achieve both high precision and power in alignments with Neff/L ≲ 3.0 (Additional file 13: Figure S15-S17). Despite the smaller range in Neff/L values, these performance trends are also observed across the Ovch32 alignments (Additional file 13: Figure S11 and S19).Fig. 1

Bottom Line: When biomolecules physically interact, natural selection operates on them jointly.Two commonly used distributions are anti-conservative and have high false positive rates in some scenarios, although the empirical distribution of scores performs reasonably well with deep alignments.We conclude that coevolutionary analysis of cross-species protein interactions holds great promise but requires sequencing many more species pairs.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Graduate Program, University of California, San Francisco, USA. aram.avilaherrera@ucsf.edu.

ABSTRACT

Background: When biomolecules physically interact, natural selection operates on them jointly. Contacting positions in protein and RNA structures exhibit correlated patterns of sequence evolution due to constraints imposed by the interaction, and molecular arms races can develop between interacting proteins in pathogens and their hosts. To evaluate how well methods developed to detect coevolving residues within proteins can be adapted for cross-species, inter-protein analysis, we used statistical criteria to quantify the performance of these methods in detecting inter-protein residues within 8 angstroms of each other in the co-crystal structures of 33 bacterial protein interactions. We also evaluated their performance for detecting known residues at the interface of a host-virus protein complex with a partially solved structure.

Results: Our quantitative benchmarking showed that all coevolutionary methods clearly benefit from alignments with many sequences. Methods that aim to detect direct correlations generally outperform other approaches. However, faster mutual information based methods are occasionally competitive in small alignments and with relaxed false positive rates. Two commonly used distributions are anti-conservative and have high false positive rates in some scenarios, although the empirical distribution of scores performs reasonably well with deep alignments.

Conclusions: We conclude that coevolutionary analysis of cross-species protein interactions holds great promise but requires sequencing many more species pairs.

No MeSH data available.


Related in: MedlinePlus