Limits...
Assessing the ability of sequence-based methods to provide functional insight within membrane integral proteins: a case study analyzing the neurotransmitter/Na+ symporter family.

Livesay DR, Kidd PD, Eskandari S, Roshan U - BMC Bioinformatics (2007)

Bottom Line: The resultant phylogenetic trees always contain six major subfamilies are consistent with the functional diversity across the family.Interestingly, the various prediction schemes provide results that are predominantly orthogonal to each other.The results presented herein clearly establish the viability of sequence-based bioinformatic strategies to provide functional insight within the NSS family.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science and Bioinformatics Research Center, University of North Carolina at Charlotte, Charlotte, NC 28262, USA. drlivesa@uncc.edu

ABSTRACT

Background: Efforts to predict functional sites from globular proteins is increasingly common; however, the most successful of these methods generally require structural insight. Unfortunately, despite several recent technological advances, structural coverage of membrane integral proteins continues to be sparse. ConSequently, sequence-based methods represent an important alternative to illuminate functional roles. In this report, we critically examine the ability of several computational methods to provide functional insight within two specific areas. First, can phylogenomic methods accurately describe the functional diversity across a membrane integral protein family? And second, can sequence-based strategies accurately predict key functional sites? Due to the presence of a recently solved structure and a vast amount of experimental mutagenesis data, the neurotransmitter/Na+ symporter (NSS) family is an ideal model system to assess the quality of our predictions.

Results: The raw NSS sequence dataset contains 181 sequences, which have been aligned by various methods. The resultant phylogenetic trees always contain six major subfamilies are consistent with the functional diversity across the family. Moreover, in well-represented subfamilies, phylogenetic clustering recapitulates several nuanced functional distinctions. Functional sites are predicted using six different methods (phylogenetic motifs, two methods that identify subfamily-specific positions, and three different conservation scores). A canonical set of 34 functional sites identified by Yamashita et al. within the recently solved LeuTAa structure is used to assess the quality of the predictions, most of which are predicted by the bioinformatic methods. Remarkably, the importance of these sites is largely confirmed by experimental mutagenesis. Furthermore, the collective set of functional site predictions qualitatively clusters along the proposed transport pathway, further demonstrating their utility. Interestingly, the various prediction schemes provide results that are predominantly orthogonal to each other. However, when the methods do provide overlapping results, specificity is shown to increase dramatically (e.g., sites predicted by any three methods have both accuracy and coverage greater than 50%).

Conclusion: The results presented herein clearly establish the viability of sequence-based bioinformatic strategies to provide functional insight within the NSS family. As such, we expect similar bioinformatic investigations will streamline functional investigations within membrane integral families in the absence of structure.

Show MeSH

Related in: MedlinePlus

Phylogenetic tree of the complete NSS family. The tree is composed of six major distinct subfamilies, four of which are specifically associated with a specific chemical class of substrates (osmolytes, biogenic amines, and two distinct classes of amino acids). Only 7 of the 24 sequences within the subfamily generically annotated as Renal system (light purple) have been experimentally characterized. The dark purple subfamily lacks any experimental annotation; however, we include it within the Renal system subfamily based on the location of the branch point. The sixth subfamily (generically annotated as prokaryotic) is much more divergent. In fact, it appears that the prokaryotic subfamily could be further split, indicated by the light and dark shades of blue. However, we do not do so due to the lack of functional annotation discriminating between the two. Triple asterisks indicate leaves of experimentally annotated homologs; the other highlighted leaf (<<<) corresponds to the sequence of the LeuTAa structure.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2194793&req=5

Figure 1: Phylogenetic tree of the complete NSS family. The tree is composed of six major distinct subfamilies, four of which are specifically associated with a specific chemical class of substrates (osmolytes, biogenic amines, and two distinct classes of amino acids). Only 7 of the 24 sequences within the subfamily generically annotated as Renal system (light purple) have been experimentally characterized. The dark purple subfamily lacks any experimental annotation; however, we include it within the Renal system subfamily based on the location of the branch point. The sixth subfamily (generically annotated as prokaryotic) is much more divergent. In fact, it appears that the prokaryotic subfamily could be further split, indicated by the light and dark shades of blue. However, we do not do so due to the lack of functional annotation discriminating between the two. Triple asterisks indicate leaves of experimentally annotated homologs; the other highlighted leaf (<<<) corresponds to the sequence of the LeuTAa structure.

Mentions: The ClustalW [28] generated NSS family phylogenetic tree is shown in Figure 1 and provided in Additional file 1. The tree has six major subfamilies (see Table 1). The PHYLIP [29] generated tree (not shown) has only minor topological differences; all six subfamily bipartitions are conserved within each tree. Four of the six subfamilies are associated with substrates of specific chemical classes. These four subfamilies include transporters for: biogenic amines (dopamine, norepinephrine and epinephrine, and serotonin), osmolytes (GABA, betaine, taurine, creatine, and several ORFans), as well as two evolutionarily distinct classes of amino acid transporters (designated Amino acid #1 and Amino acid #2). The other two subfamilies include a poorly characterized subfamily generically designated as the Renal system (because most of the characterized sequences from this subfamily are found within the kidney and/or intestine), and a large prokaryotic subfamily. The osmolyte subfamily, which is the largest subfamily observed, contains 46 sequences. Conversely, the second of two amino acid subfamilies (Amino acid #2) has only five sequences. Bootstrapping clearly indicates that all six subfamilies (in both trees) are statistically robust, including the small Amino acid #2 subfamily (see Additional file 2).


Assessing the ability of sequence-based methods to provide functional insight within membrane integral proteins: a case study analyzing the neurotransmitter/Na+ symporter family.

Livesay DR, Kidd PD, Eskandari S, Roshan U - BMC Bioinformatics (2007)

Phylogenetic tree of the complete NSS family. The tree is composed of six major distinct subfamilies, four of which are specifically associated with a specific chemical class of substrates (osmolytes, biogenic amines, and two distinct classes of amino acids). Only 7 of the 24 sequences within the subfamily generically annotated as Renal system (light purple) have been experimentally characterized. The dark purple subfamily lacks any experimental annotation; however, we include it within the Renal system subfamily based on the location of the branch point. The sixth subfamily (generically annotated as prokaryotic) is much more divergent. In fact, it appears that the prokaryotic subfamily could be further split, indicated by the light and dark shades of blue. However, we do not do so due to the lack of functional annotation discriminating between the two. Triple asterisks indicate leaves of experimentally annotated homologs; the other highlighted leaf (<<<) corresponds to the sequence of the LeuTAa structure.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2194793&req=5

Figure 1: Phylogenetic tree of the complete NSS family. The tree is composed of six major distinct subfamilies, four of which are specifically associated with a specific chemical class of substrates (osmolytes, biogenic amines, and two distinct classes of amino acids). Only 7 of the 24 sequences within the subfamily generically annotated as Renal system (light purple) have been experimentally characterized. The dark purple subfamily lacks any experimental annotation; however, we include it within the Renal system subfamily based on the location of the branch point. The sixth subfamily (generically annotated as prokaryotic) is much more divergent. In fact, it appears that the prokaryotic subfamily could be further split, indicated by the light and dark shades of blue. However, we do not do so due to the lack of functional annotation discriminating between the two. Triple asterisks indicate leaves of experimentally annotated homologs; the other highlighted leaf (<<<) corresponds to the sequence of the LeuTAa structure.
Mentions: The ClustalW [28] generated NSS family phylogenetic tree is shown in Figure 1 and provided in Additional file 1. The tree has six major subfamilies (see Table 1). The PHYLIP [29] generated tree (not shown) has only minor topological differences; all six subfamily bipartitions are conserved within each tree. Four of the six subfamilies are associated with substrates of specific chemical classes. These four subfamilies include transporters for: biogenic amines (dopamine, norepinephrine and epinephrine, and serotonin), osmolytes (GABA, betaine, taurine, creatine, and several ORFans), as well as two evolutionarily distinct classes of amino acid transporters (designated Amino acid #1 and Amino acid #2). The other two subfamilies include a poorly characterized subfamily generically designated as the Renal system (because most of the characterized sequences from this subfamily are found within the kidney and/or intestine), and a large prokaryotic subfamily. The osmolyte subfamily, which is the largest subfamily observed, contains 46 sequences. Conversely, the second of two amino acid subfamilies (Amino acid #2) has only five sequences. Bootstrapping clearly indicates that all six subfamilies (in both trees) are statistically robust, including the small Amino acid #2 subfamily (see Additional file 2).

Bottom Line: The resultant phylogenetic trees always contain six major subfamilies are consistent with the functional diversity across the family.Interestingly, the various prediction schemes provide results that are predominantly orthogonal to each other.The results presented herein clearly establish the viability of sequence-based bioinformatic strategies to provide functional insight within the NSS family.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science and Bioinformatics Research Center, University of North Carolina at Charlotte, Charlotte, NC 28262, USA. drlivesa@uncc.edu

ABSTRACT

Background: Efforts to predict functional sites from globular proteins is increasingly common; however, the most successful of these methods generally require structural insight. Unfortunately, despite several recent technological advances, structural coverage of membrane integral proteins continues to be sparse. ConSequently, sequence-based methods represent an important alternative to illuminate functional roles. In this report, we critically examine the ability of several computational methods to provide functional insight within two specific areas. First, can phylogenomic methods accurately describe the functional diversity across a membrane integral protein family? And second, can sequence-based strategies accurately predict key functional sites? Due to the presence of a recently solved structure and a vast amount of experimental mutagenesis data, the neurotransmitter/Na+ symporter (NSS) family is an ideal model system to assess the quality of our predictions.

Results: The raw NSS sequence dataset contains 181 sequences, which have been aligned by various methods. The resultant phylogenetic trees always contain six major subfamilies are consistent with the functional diversity across the family. Moreover, in well-represented subfamilies, phylogenetic clustering recapitulates several nuanced functional distinctions. Functional sites are predicted using six different methods (phylogenetic motifs, two methods that identify subfamily-specific positions, and three different conservation scores). A canonical set of 34 functional sites identified by Yamashita et al. within the recently solved LeuTAa structure is used to assess the quality of the predictions, most of which are predicted by the bioinformatic methods. Remarkably, the importance of these sites is largely confirmed by experimental mutagenesis. Furthermore, the collective set of functional site predictions qualitatively clusters along the proposed transport pathway, further demonstrating their utility. Interestingly, the various prediction schemes provide results that are predominantly orthogonal to each other. However, when the methods do provide overlapping results, specificity is shown to increase dramatically (e.g., sites predicted by any three methods have both accuracy and coverage greater than 50%).

Conclusion: The results presented herein clearly establish the viability of sequence-based bioinformatic strategies to provide functional insight within the NSS family. As such, we expect similar bioinformatic investigations will streamline functional investigations within membrane integral families in the absence of structure.

Show MeSH
Related in: MedlinePlus