Limits...
Assessing the ability of sequence-based methods to provide functional insight within membrane integral proteins: a case study analyzing the neurotransmitter/Na+ symporter family.

Livesay DR, Kidd PD, Eskandari S, Roshan U - BMC Bioinformatics (2007)

Bottom Line: The resultant phylogenetic trees always contain six major subfamilies are consistent with the functional diversity across the family.Interestingly, the various prediction schemes provide results that are predominantly orthogonal to each other.The results presented herein clearly establish the viability of sequence-based bioinformatic strategies to provide functional insight within the NSS family.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science and Bioinformatics Research Center, University of North Carolina at Charlotte, Charlotte, NC 28262, USA. drlivesa@uncc.edu

ABSTRACT

Background: Efforts to predict functional sites from globular proteins is increasingly common; however, the most successful of these methods generally require structural insight. Unfortunately, despite several recent technological advances, structural coverage of membrane integral proteins continues to be sparse. ConSequently, sequence-based methods represent an important alternative to illuminate functional roles. In this report, we critically examine the ability of several computational methods to provide functional insight within two specific areas. First, can phylogenomic methods accurately describe the functional diversity across a membrane integral protein family? And second, can sequence-based strategies accurately predict key functional sites? Due to the presence of a recently solved structure and a vast amount of experimental mutagenesis data, the neurotransmitter/Na+ symporter (NSS) family is an ideal model system to assess the quality of our predictions.

Results: The raw NSS sequence dataset contains 181 sequences, which have been aligned by various methods. The resultant phylogenetic trees always contain six major subfamilies are consistent with the functional diversity across the family. Moreover, in well-represented subfamilies, phylogenetic clustering recapitulates several nuanced functional distinctions. Functional sites are predicted using six different methods (phylogenetic motifs, two methods that identify subfamily-specific positions, and three different conservation scores). A canonical set of 34 functional sites identified by Yamashita et al. within the recently solved LeuTAa structure is used to assess the quality of the predictions, most of which are predicted by the bioinformatic methods. Remarkably, the importance of these sites is largely confirmed by experimental mutagenesis. Furthermore, the collective set of functional site predictions qualitatively clusters along the proposed transport pathway, further demonstrating their utility. Interestingly, the various prediction schemes provide results that are predominantly orthogonal to each other. However, when the methods do provide overlapping results, specificity is shown to increase dramatically (e.g., sites predicted by any three methods have both accuracy and coverage greater than 50%).

Conclusion: The results presented herein clearly establish the viability of sequence-based bioinformatic strategies to provide functional insight within the NSS family. As such, we expect similar bioinformatic investigations will streamline functional investigations within membrane integral families in the absence of structure.

Show MeSH

Related in: MedlinePlus

Structural descriptions of the functional site predictions within the leucine (top) and sodium ion (bottom) binding sites. Red indicates functional residues that are predicted, whereas blue indicates not predicted by (a) phylogenetic motifs, (b) false positive expectation, (c) site conservation, (d) Rate4Site, (e) evolutionary trace, and (f) SDPpred. (g) In the last frame, residues are color-coded based on the number of methods (excluding SDPpred) that predict each position (0 = blue, 1 = cyan, 2 = green, 3 = yellow, 4 = red, and 5 = magenta). In all cases, the leucine and sodium ion substrates are colored orange.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2194793&req=5

Figure 5: Structural descriptions of the functional site predictions within the leucine (top) and sodium ion (bottom) binding sites. Red indicates functional residues that are predicted, whereas blue indicates not predicted by (a) phylogenetic motifs, (b) false positive expectation, (c) site conservation, (d) Rate4Site, (e) evolutionary trace, and (f) SDPpred. (g) In the last frame, residues are color-coded based on the number of methods (excluding SDPpred) that predict each position (0 = blue, 1 = cyan, 2 = green, 3 = yellow, 4 = red, and 5 = magenta). In all cases, the leucine and sodium ion substrates are colored orange.

Mentions: The leucine substrate, two sodium ions, and a chloride ion are co-crystallized within the LeuTAa structure. The biological importance of the leucine and sodium ion binding sites is unambiguous, thus it follows that knowing how well the methods predict the leucine and sodium ion binding sites is imperative to their assessment. (Note that the significance of the chloride ion-binding site, which is structurally remote from the others, is still being debated, thus it is omitted from this analysis.) Within the functional site benchmark, 14 residues are defined as part of the leucine-binding site, whereas nine constitute the sodium ion binding sites (see Table 2). Three residues (Ala22, Thr254, and Ser355) are involved in both. Figure 5 clearly indicates that the five different methods result in substantially different predictions. Interestingly, the two methods based on sequence windows (PM and FPE) have better coverage of these residues (14 and 16, respectively). While it is straightforward to view their increased coverage as a simple fact that they predict sequence chunks, this is not the case. In fact, the total number of alignment positions predicted by FPE is less than SC85 and ET. The other three methods (SC85, Rate4Site, and ET) predict 11, 14, and 9, respectively. The poor coverage of the leucine-binding site by ET and SDPpred, both of which look (at least in part) for subfamily specific residues, is particularly notable. The good coverage of the Leucine-binding site by the remaining conservation measures suggests that the general binding site location is conserved across the family; however, results from the class specific methods (ET and SDPpred) suggest that the exact details of the transporter-substrate interaction are likely defined by a different set of residue positions across the family. Figure 5g color-codes the binding site residues by the number of different methods that predict them. Encouragingly, 71% of the binding site residues are predicted by at least two different methods, and 59% of the binding sites are predicted by at least three different methods. Coverage of the binding site and extracellular/periplasmic gate residues is also quite good. The coverage of each by three or more methods is 67% and 60%. Only one (10%) of the cytoplasmic gate residues is predicted by three or more methods.


Assessing the ability of sequence-based methods to provide functional insight within membrane integral proteins: a case study analyzing the neurotransmitter/Na+ symporter family.

Livesay DR, Kidd PD, Eskandari S, Roshan U - BMC Bioinformatics (2007)

Structural descriptions of the functional site predictions within the leucine (top) and sodium ion (bottom) binding sites. Red indicates functional residues that are predicted, whereas blue indicates not predicted by (a) phylogenetic motifs, (b) false positive expectation, (c) site conservation, (d) Rate4Site, (e) evolutionary trace, and (f) SDPpred. (g) In the last frame, residues are color-coded based on the number of methods (excluding SDPpred) that predict each position (0 = blue, 1 = cyan, 2 = green, 3 = yellow, 4 = red, and 5 = magenta). In all cases, the leucine and sodium ion substrates are colored orange.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2194793&req=5

Figure 5: Structural descriptions of the functional site predictions within the leucine (top) and sodium ion (bottom) binding sites. Red indicates functional residues that are predicted, whereas blue indicates not predicted by (a) phylogenetic motifs, (b) false positive expectation, (c) site conservation, (d) Rate4Site, (e) evolutionary trace, and (f) SDPpred. (g) In the last frame, residues are color-coded based on the number of methods (excluding SDPpred) that predict each position (0 = blue, 1 = cyan, 2 = green, 3 = yellow, 4 = red, and 5 = magenta). In all cases, the leucine and sodium ion substrates are colored orange.
Mentions: The leucine substrate, two sodium ions, and a chloride ion are co-crystallized within the LeuTAa structure. The biological importance of the leucine and sodium ion binding sites is unambiguous, thus it follows that knowing how well the methods predict the leucine and sodium ion binding sites is imperative to their assessment. (Note that the significance of the chloride ion-binding site, which is structurally remote from the others, is still being debated, thus it is omitted from this analysis.) Within the functional site benchmark, 14 residues are defined as part of the leucine-binding site, whereas nine constitute the sodium ion binding sites (see Table 2). Three residues (Ala22, Thr254, and Ser355) are involved in both. Figure 5 clearly indicates that the five different methods result in substantially different predictions. Interestingly, the two methods based on sequence windows (PM and FPE) have better coverage of these residues (14 and 16, respectively). While it is straightforward to view their increased coverage as a simple fact that they predict sequence chunks, this is not the case. In fact, the total number of alignment positions predicted by FPE is less than SC85 and ET. The other three methods (SC85, Rate4Site, and ET) predict 11, 14, and 9, respectively. The poor coverage of the leucine-binding site by ET and SDPpred, both of which look (at least in part) for subfamily specific residues, is particularly notable. The good coverage of the Leucine-binding site by the remaining conservation measures suggests that the general binding site location is conserved across the family; however, results from the class specific methods (ET and SDPpred) suggest that the exact details of the transporter-substrate interaction are likely defined by a different set of residue positions across the family. Figure 5g color-codes the binding site residues by the number of different methods that predict them. Encouragingly, 71% of the binding site residues are predicted by at least two different methods, and 59% of the binding sites are predicted by at least three different methods. Coverage of the binding site and extracellular/periplasmic gate residues is also quite good. The coverage of each by three or more methods is 67% and 60%. Only one (10%) of the cytoplasmic gate residues is predicted by three or more methods.

Bottom Line: The resultant phylogenetic trees always contain six major subfamilies are consistent with the functional diversity across the family.Interestingly, the various prediction schemes provide results that are predominantly orthogonal to each other.The results presented herein clearly establish the viability of sequence-based bioinformatic strategies to provide functional insight within the NSS family.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science and Bioinformatics Research Center, University of North Carolina at Charlotte, Charlotte, NC 28262, USA. drlivesa@uncc.edu

ABSTRACT

Background: Efforts to predict functional sites from globular proteins is increasingly common; however, the most successful of these methods generally require structural insight. Unfortunately, despite several recent technological advances, structural coverage of membrane integral proteins continues to be sparse. ConSequently, sequence-based methods represent an important alternative to illuminate functional roles. In this report, we critically examine the ability of several computational methods to provide functional insight within two specific areas. First, can phylogenomic methods accurately describe the functional diversity across a membrane integral protein family? And second, can sequence-based strategies accurately predict key functional sites? Due to the presence of a recently solved structure and a vast amount of experimental mutagenesis data, the neurotransmitter/Na+ symporter (NSS) family is an ideal model system to assess the quality of our predictions.

Results: The raw NSS sequence dataset contains 181 sequences, which have been aligned by various methods. The resultant phylogenetic trees always contain six major subfamilies are consistent with the functional diversity across the family. Moreover, in well-represented subfamilies, phylogenetic clustering recapitulates several nuanced functional distinctions. Functional sites are predicted using six different methods (phylogenetic motifs, two methods that identify subfamily-specific positions, and three different conservation scores). A canonical set of 34 functional sites identified by Yamashita et al. within the recently solved LeuTAa structure is used to assess the quality of the predictions, most of which are predicted by the bioinformatic methods. Remarkably, the importance of these sites is largely confirmed by experimental mutagenesis. Furthermore, the collective set of functional site predictions qualitatively clusters along the proposed transport pathway, further demonstrating their utility. Interestingly, the various prediction schemes provide results that are predominantly orthogonal to each other. However, when the methods do provide overlapping results, specificity is shown to increase dramatically (e.g., sites predicted by any three methods have both accuracy and coverage greater than 50%).

Conclusion: The results presented herein clearly establish the viability of sequence-based bioinformatic strategies to provide functional insight within the NSS family. As such, we expect similar bioinformatic investigations will streamline functional investigations within membrane integral families in the absence of structure.

Show MeSH
Related in: MedlinePlus