Limits...
Assessing the ability of sequence-based methods to provide functional insight within membrane integral proteins: a case study analyzing the neurotransmitter/Na+ symporter family.

Livesay DR, Kidd PD, Eskandari S, Roshan U - BMC Bioinformatics (2007)

Bottom Line: The resultant phylogenetic trees always contain six major subfamilies are consistent with the functional diversity across the family.Interestingly, the various prediction schemes provide results that are predominantly orthogonal to each other.The results presented herein clearly establish the viability of sequence-based bioinformatic strategies to provide functional insight within the NSS family.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science and Bioinformatics Research Center, University of North Carolina at Charlotte, Charlotte, NC 28262, USA. drlivesa@uncc.edu

ABSTRACT

Background: Efforts to predict functional sites from globular proteins is increasingly common; however, the most successful of these methods generally require structural insight. Unfortunately, despite several recent technological advances, structural coverage of membrane integral proteins continues to be sparse. ConSequently, sequence-based methods represent an important alternative to illuminate functional roles. In this report, we critically examine the ability of several computational methods to provide functional insight within two specific areas. First, can phylogenomic methods accurately describe the functional diversity across a membrane integral protein family? And second, can sequence-based strategies accurately predict key functional sites? Due to the presence of a recently solved structure and a vast amount of experimental mutagenesis data, the neurotransmitter/Na+ symporter (NSS) family is an ideal model system to assess the quality of our predictions.

Results: The raw NSS sequence dataset contains 181 sequences, which have been aligned by various methods. The resultant phylogenetic trees always contain six major subfamilies are consistent with the functional diversity across the family. Moreover, in well-represented subfamilies, phylogenetic clustering recapitulates several nuanced functional distinctions. Functional sites are predicted using six different methods (phylogenetic motifs, two methods that identify subfamily-specific positions, and three different conservation scores). A canonical set of 34 functional sites identified by Yamashita et al. within the recently solved LeuTAa structure is used to assess the quality of the predictions, most of which are predicted by the bioinformatic methods. Remarkably, the importance of these sites is largely confirmed by experimental mutagenesis. Furthermore, the collective set of functional site predictions qualitatively clusters along the proposed transport pathway, further demonstrating their utility. Interestingly, the various prediction schemes provide results that are predominantly orthogonal to each other. However, when the methods do provide overlapping results, specificity is shown to increase dramatically (e.g., sites predicted by any three methods have both accuracy and coverage greater than 50%).

Conclusion: The results presented herein clearly establish the viability of sequence-based bioinformatic strategies to provide functional insight within the NSS family. As such, we expect similar bioinformatic investigations will streamline functional investigations within membrane integral families in the absence of structure.

Show MeSH

Related in: MedlinePlus

Structural superposition of all functional site predictions onto the LeuTAa structure. Spheres represent α-carbons of the predicted residues, which are color-coded by the number of methods (excluding SDPpred) that predict each residue (1 = cyan, 2 = green, 3 = yellow, 4 = red, and 5 = magenta). The four views show sites predicted by at least (a) one, (b) two, (c) three, and (d) four methods. In all cases, the leucine, sodium ions, and chloride ion are colored blue.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2194793&req=5

Figure 6: Structural superposition of all functional site predictions onto the LeuTAa structure. Spheres represent α-carbons of the predicted residues, which are color-coded by the number of methods (excluding SDPpred) that predict each residue (1 = cyan, 2 = green, 3 = yellow, 4 = red, and 5 = magenta). The four views show sites predicted by at least (a) one, (b) two, (c) three, and (d) four methods. In all cases, the leucine, sodium ions, and chloride ion are colored blue.

Mentions: We also investigate how predictions based on simple intersections of the various unique methods improve prediction accuracy. Meaning, only positions that are simultaneously predicted by multiple methods are put forth as a prediction. Due to poor overall performance, SDPpred is excluded from this analysis. Moreover, SDPpred only predicts positions that are predicted by at least three other unique methods. Table 3 demonstrates that the simple Intersect method clearly improves performance. Only nine positions are concurrently predicted by all five schemes. As discussed above, one corresponds to Glu62; three others correspond to binding site residues; and three correspond to extracellular/periplasmic gate residues. When a site is predicted by any four methods, 22 are predicted, half of which are included in the functional site set. Impressively, relaxing the criterion to any three methods raises the coverage and accuracy to 0.56 and 0.44, respectively. When any two methods intersect, the accuracy is reduced to 0.29 (which is within the range of the individual methods), but the coverage increases to an impressive 0.71. Interestingly, Figure 6 indicates that predictions with better support, meaning they are predicted by multiple methods, are more likely to cluster around the leucine-binding site and the proposed transport route (discussed below). It will be quite interesting to determine from future investigations if the Intersect predictions (vs. individual methods) do a better job of predicting positions that exhibit a functionally deleterious phenotype upon mutation.


Assessing the ability of sequence-based methods to provide functional insight within membrane integral proteins: a case study analyzing the neurotransmitter/Na+ symporter family.

Livesay DR, Kidd PD, Eskandari S, Roshan U - BMC Bioinformatics (2007)

Structural superposition of all functional site predictions onto the LeuTAa structure. Spheres represent α-carbons of the predicted residues, which are color-coded by the number of methods (excluding SDPpred) that predict each residue (1 = cyan, 2 = green, 3 = yellow, 4 = red, and 5 = magenta). The four views show sites predicted by at least (a) one, (b) two, (c) three, and (d) four methods. In all cases, the leucine, sodium ions, and chloride ion are colored blue.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2194793&req=5

Figure 6: Structural superposition of all functional site predictions onto the LeuTAa structure. Spheres represent α-carbons of the predicted residues, which are color-coded by the number of methods (excluding SDPpred) that predict each residue (1 = cyan, 2 = green, 3 = yellow, 4 = red, and 5 = magenta). The four views show sites predicted by at least (a) one, (b) two, (c) three, and (d) four methods. In all cases, the leucine, sodium ions, and chloride ion are colored blue.
Mentions: We also investigate how predictions based on simple intersections of the various unique methods improve prediction accuracy. Meaning, only positions that are simultaneously predicted by multiple methods are put forth as a prediction. Due to poor overall performance, SDPpred is excluded from this analysis. Moreover, SDPpred only predicts positions that are predicted by at least three other unique methods. Table 3 demonstrates that the simple Intersect method clearly improves performance. Only nine positions are concurrently predicted by all five schemes. As discussed above, one corresponds to Glu62; three others correspond to binding site residues; and three correspond to extracellular/periplasmic gate residues. When a site is predicted by any four methods, 22 are predicted, half of which are included in the functional site set. Impressively, relaxing the criterion to any three methods raises the coverage and accuracy to 0.56 and 0.44, respectively. When any two methods intersect, the accuracy is reduced to 0.29 (which is within the range of the individual methods), but the coverage increases to an impressive 0.71. Interestingly, Figure 6 indicates that predictions with better support, meaning they are predicted by multiple methods, are more likely to cluster around the leucine-binding site and the proposed transport route (discussed below). It will be quite interesting to determine from future investigations if the Intersect predictions (vs. individual methods) do a better job of predicting positions that exhibit a functionally deleterious phenotype upon mutation.

Bottom Line: The resultant phylogenetic trees always contain six major subfamilies are consistent with the functional diversity across the family.Interestingly, the various prediction schemes provide results that are predominantly orthogonal to each other.The results presented herein clearly establish the viability of sequence-based bioinformatic strategies to provide functional insight within the NSS family.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science and Bioinformatics Research Center, University of North Carolina at Charlotte, Charlotte, NC 28262, USA. drlivesa@uncc.edu

ABSTRACT

Background: Efforts to predict functional sites from globular proteins is increasingly common; however, the most successful of these methods generally require structural insight. Unfortunately, despite several recent technological advances, structural coverage of membrane integral proteins continues to be sparse. ConSequently, sequence-based methods represent an important alternative to illuminate functional roles. In this report, we critically examine the ability of several computational methods to provide functional insight within two specific areas. First, can phylogenomic methods accurately describe the functional diversity across a membrane integral protein family? And second, can sequence-based strategies accurately predict key functional sites? Due to the presence of a recently solved structure and a vast amount of experimental mutagenesis data, the neurotransmitter/Na+ symporter (NSS) family is an ideal model system to assess the quality of our predictions.

Results: The raw NSS sequence dataset contains 181 sequences, which have been aligned by various methods. The resultant phylogenetic trees always contain six major subfamilies are consistent with the functional diversity across the family. Moreover, in well-represented subfamilies, phylogenetic clustering recapitulates several nuanced functional distinctions. Functional sites are predicted using six different methods (phylogenetic motifs, two methods that identify subfamily-specific positions, and three different conservation scores). A canonical set of 34 functional sites identified by Yamashita et al. within the recently solved LeuTAa structure is used to assess the quality of the predictions, most of which are predicted by the bioinformatic methods. Remarkably, the importance of these sites is largely confirmed by experimental mutagenesis. Furthermore, the collective set of functional site predictions qualitatively clusters along the proposed transport pathway, further demonstrating their utility. Interestingly, the various prediction schemes provide results that are predominantly orthogonal to each other. However, when the methods do provide overlapping results, specificity is shown to increase dramatically (e.g., sites predicted by any three methods have both accuracy and coverage greater than 50%).

Conclusion: The results presented herein clearly establish the viability of sequence-based bioinformatic strategies to provide functional insight within the NSS family. As such, we expect similar bioinformatic investigations will streamline functional investigations within membrane integral families in the absence of structure.

Show MeSH
Related in: MedlinePlus