Limits...
Topology based identification and comprehensive classification of four-transmembrane helix containing proteins (4TMs) in the human genome.

Attwood MM, Krishnan A, Pivotti V, Yazdi S, Almén MS, Schiöth HB - BMC Genomics (2016)

Bottom Line: From a structural perspective, the α-helical transmembrane proteins can be categorized into major groups based on the number of transmembrane helices and these groups are often associated with specific functions.When compared to the well-characterized seven-transmembrane containing proteins (7TM), other TM groups are less explored and in particular the 4TM group.Moreover, we found an interesting exception to the ubiquitous intracellular N- and C-termini localization that is found throughout the entire membrane proteome and 4TM dataset in the neurotransmitter gated ion channel families.

View Article: PubMed Central - PubMed

Affiliation: Department of Neuroscience, Functional Pharmacology, Uppsala University, BMC, Box 593, 751 24, Uppsala, Sweden.

ABSTRACT

Background: Membrane proteins are key components in a large spectrum of diverse functions and thus account for the major proportion of the drug-targeted portion of the genome. From a structural perspective, the α-helical transmembrane proteins can be categorized into major groups based on the number of transmembrane helices and these groups are often associated with specific functions. When compared to the well-characterized seven-transmembrane containing proteins (7TM), other TM groups are less explored and in particular the 4TM group. In this study, we identify the complete 4TM complement from the latest release of the human genome and assess the 4TM structure group as a whole. We functionally characterize this dataset and evaluate the resulting groups and ubiquitous functions, and furthermore describe disease and drug target involvement.

Results: We classified 373 proteins, which represents ~7 % of the human membrane proteome, and includes 69 more proteins than our previous estimate. We have characterized the 4TM dataset based on functional, structural, and/or evolutionary similarities. Proteins that are involved in transport activity constitute 37 % of the dataset, 23 % are receptor-related, and 13 % have enzymatic functions. Intriguingly, proteins involved in transport are more than double the 15 % of transporters in the entire human membrane proteome, which might suggest that the 4TM topological architecture is more favored for transporting molecules over other functions. Moreover, we found an interesting exception to the ubiquitous intracellular N- and C-termini localization that is found throughout the entire membrane proteome and 4TM dataset in the neurotransmitter gated ion channel families. Overall, we estimate that 58 % of the dataset has a known association to disease conditions with 19 % of the genes possibly involved in different types of cancer.

Conclusions: We provide here the most robust and updated classification of the 4TM complement of the human genome as a platform to further understand the characteristics of 4TM functions and to explore pharmacological opportunities.

No MeSH data available.


Related in: MedlinePlus

(Parts a, b, c, d, and e). Common topologies and conserved features within the 4TM dataset. Part a The Miscellaneous class includes proteins that have been characterized into subgroups through similar functional activities. Common features include intracellular termini and conserved cysteine residues (yellow outlined in red ovals) in the extracellular loops that either engage in forming disulphide bonds (e.g. claudins and tetraspanins) or interact and form bonds with other proteins (i.e. connexins). Tetraspanins have 4-6 conserved cysteine residues as well as a conserved CCG (cysteine-cysteine-glycine) motif in the large extracellular loop 2. Part b The example here, MS4A2, is one of the two identified receptors and a member of the MS4A protein family, in which 16 members are characterized by 4TMs, a CD20 domain, and an in N-terminus. Part c The Transporter class includes 66 proteins, of which 65 have an in N-terminus and conserved cysteine residues in the extracellular loops are common. TMEM205 is the sole transporter with the opposite topology, and is interesting due to its use of novel mechanisms in cisplatin (chemotherapy) resistance [82]. Part d Of the 45 Enzyme class proteins, all except five maintain an N-terminus intracellular location. ZDHHC-3 is a typical protein of the ZDHHC protein family, characterized by 4TMs, a conserved DHHC domain, and a conserved DPG (aspartate-proline-glycine) motif as well as a TTxE (threonine-threonine-asparagine-glutamate) motif. Part e The Dual function class contains 47 proteins and all 46 proteins of the neurotransmitter gated ion channel family (NGIC) are included here. The NGIC family has a long extracellular N-terminus that contains several important binding sites as well as two conserved cysteine residues that participate in disulphide bonds. The NGIC family is unique in that it has extracellular N- and C-termini and also has signal peptides predicted in 40 of the proteins
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4815072&req=5

Fig8: (Parts a, b, c, d, and e). Common topologies and conserved features within the 4TM dataset. Part a The Miscellaneous class includes proteins that have been characterized into subgroups through similar functional activities. Common features include intracellular termini and conserved cysteine residues (yellow outlined in red ovals) in the extracellular loops that either engage in forming disulphide bonds (e.g. claudins and tetraspanins) or interact and form bonds with other proteins (i.e. connexins). Tetraspanins have 4-6 conserved cysteine residues as well as a conserved CCG (cysteine-cysteine-glycine) motif in the large extracellular loop 2. Part b The example here, MS4A2, is one of the two identified receptors and a member of the MS4A protein family, in which 16 members are characterized by 4TMs, a CD20 domain, and an in N-terminus. Part c The Transporter class includes 66 proteins, of which 65 have an in N-terminus and conserved cysteine residues in the extracellular loops are common. TMEM205 is the sole transporter with the opposite topology, and is interesting due to its use of novel mechanisms in cisplatin (chemotherapy) resistance [82]. Part d Of the 45 Enzyme class proteins, all except five maintain an N-terminus intracellular location. ZDHHC-3 is a typical protein of the ZDHHC protein family, characterized by 4TMs, a conserved DHHC domain, and a conserved DPG (aspartate-proline-glycine) motif as well as a TTxE (threonine-threonine-asparagine-glutamate) motif. Part e The Dual function class contains 47 proteins and all 46 proteins of the neurotransmitter gated ion channel family (NGIC) are included here. The NGIC family has a long extracellular N-terminus that contains several important binding sites as well as two conserved cysteine residues that participate in disulphide bonds. The NGIC family is unique in that it has extracellular N- and C-termini and also has signal peptides predicted in 40 of the proteins

Mentions: In addition to the number of membrane spanning helices, the orientations of the N- and C-termini are important factors in determining the functional activity of the protein. The terminal orientations are usually determined by the initial insertion of the peptide into the membrane, however the presence of a signal peptide can influence different orientations. Signal peptides are short sequences of amino acid residues attached to the N-terminus domain that target the protein to the membrane and are then subsequently removed by proteolysis post membrane insertion. In addition, when using transmembrane protein prediction methods, it is important to assess and excise signal peptides from sequences as otherwise they can be mistaken as transmembrane helices due to the hydrophobicity of the peptide sequence [49]. The results of the TOPCONS-single topology predictions for this dataset include 316 proteins with the N- and C-termini located within the lumen of the membrane and 57 proteins with the terminals in the outside environment. The complete NGIC group, which compose virtually all of the Dual function class with 46 proteins, have the N- and C-termini located in the extracellular environment. The extracellular N-terminus is consistent with the other large receptor groups, i.e., the 1TMs and 7TMs, however the C-terminal is usually located in the intracellular environment due to the important activities it is typically involved in, particularly signaling transduction. Additionally, 40 of the NGIC proteins are predicted from the SignalP signal peptide prediction software to have signal peptides. In total there are 60 proteins that are predicted to have signal peptides. As current transmembrane protein prediction methods are based on classical helical structures, i.e. those that completely span the membrane, they do not take into account anomalies such as reentrant loops, short breaks in helices, and helices that lie along the surface of the membrane [8]. As mentioned previously with TMEM14C, this method limitation could possibly affect the topology prediction. For example, those 11 proteins that have predicted extracellular orientation (excluding the NGIC family) might be interesting proteins to study for transmembrane structural purposes. Examples of common structures and topologies for each functional class are represented in Fig. 8, which highlights conserved features and N- and C-terminus orientations found in the 4TM dataset. For example, claudins, tetraspanins, and connexins categorized in the Miscellaneous class are shown with the four conserved membrane regions, the intracellular location of the N- and C-termini, and also conserved cysteine residues that are often found in the extracellular loops and are involved in forming disulphide bonds.Fig. 8


Topology based identification and comprehensive classification of four-transmembrane helix containing proteins (4TMs) in the human genome.

Attwood MM, Krishnan A, Pivotti V, Yazdi S, Almén MS, Schiöth HB - BMC Genomics (2016)

(Parts a, b, c, d, and e). Common topologies and conserved features within the 4TM dataset. Part a The Miscellaneous class includes proteins that have been characterized into subgroups through similar functional activities. Common features include intracellular termini and conserved cysteine residues (yellow outlined in red ovals) in the extracellular loops that either engage in forming disulphide bonds (e.g. claudins and tetraspanins) or interact and form bonds with other proteins (i.e. connexins). Tetraspanins have 4-6 conserved cysteine residues as well as a conserved CCG (cysteine-cysteine-glycine) motif in the large extracellular loop 2. Part b The example here, MS4A2, is one of the two identified receptors and a member of the MS4A protein family, in which 16 members are characterized by 4TMs, a CD20 domain, and an in N-terminus. Part c The Transporter class includes 66 proteins, of which 65 have an in N-terminus and conserved cysteine residues in the extracellular loops are common. TMEM205 is the sole transporter with the opposite topology, and is interesting due to its use of novel mechanisms in cisplatin (chemotherapy) resistance [82]. Part d Of the 45 Enzyme class proteins, all except five maintain an N-terminus intracellular location. ZDHHC-3 is a typical protein of the ZDHHC protein family, characterized by 4TMs, a conserved DHHC domain, and a conserved DPG (aspartate-proline-glycine) motif as well as a TTxE (threonine-threonine-asparagine-glutamate) motif. Part e The Dual function class contains 47 proteins and all 46 proteins of the neurotransmitter gated ion channel family (NGIC) are included here. The NGIC family has a long extracellular N-terminus that contains several important binding sites as well as two conserved cysteine residues that participate in disulphide bonds. The NGIC family is unique in that it has extracellular N- and C-termini and also has signal peptides predicted in 40 of the proteins
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4815072&req=5

Fig8: (Parts a, b, c, d, and e). Common topologies and conserved features within the 4TM dataset. Part a The Miscellaneous class includes proteins that have been characterized into subgroups through similar functional activities. Common features include intracellular termini and conserved cysteine residues (yellow outlined in red ovals) in the extracellular loops that either engage in forming disulphide bonds (e.g. claudins and tetraspanins) or interact and form bonds with other proteins (i.e. connexins). Tetraspanins have 4-6 conserved cysteine residues as well as a conserved CCG (cysteine-cysteine-glycine) motif in the large extracellular loop 2. Part b The example here, MS4A2, is one of the two identified receptors and a member of the MS4A protein family, in which 16 members are characterized by 4TMs, a CD20 domain, and an in N-terminus. Part c The Transporter class includes 66 proteins, of which 65 have an in N-terminus and conserved cysteine residues in the extracellular loops are common. TMEM205 is the sole transporter with the opposite topology, and is interesting due to its use of novel mechanisms in cisplatin (chemotherapy) resistance [82]. Part d Of the 45 Enzyme class proteins, all except five maintain an N-terminus intracellular location. ZDHHC-3 is a typical protein of the ZDHHC protein family, characterized by 4TMs, a conserved DHHC domain, and a conserved DPG (aspartate-proline-glycine) motif as well as a TTxE (threonine-threonine-asparagine-glutamate) motif. Part e The Dual function class contains 47 proteins and all 46 proteins of the neurotransmitter gated ion channel family (NGIC) are included here. The NGIC family has a long extracellular N-terminus that contains several important binding sites as well as two conserved cysteine residues that participate in disulphide bonds. The NGIC family is unique in that it has extracellular N- and C-termini and also has signal peptides predicted in 40 of the proteins
Mentions: In addition to the number of membrane spanning helices, the orientations of the N- and C-termini are important factors in determining the functional activity of the protein. The terminal orientations are usually determined by the initial insertion of the peptide into the membrane, however the presence of a signal peptide can influence different orientations. Signal peptides are short sequences of amino acid residues attached to the N-terminus domain that target the protein to the membrane and are then subsequently removed by proteolysis post membrane insertion. In addition, when using transmembrane protein prediction methods, it is important to assess and excise signal peptides from sequences as otherwise they can be mistaken as transmembrane helices due to the hydrophobicity of the peptide sequence [49]. The results of the TOPCONS-single topology predictions for this dataset include 316 proteins with the N- and C-termini located within the lumen of the membrane and 57 proteins with the terminals in the outside environment. The complete NGIC group, which compose virtually all of the Dual function class with 46 proteins, have the N- and C-termini located in the extracellular environment. The extracellular N-terminus is consistent with the other large receptor groups, i.e., the 1TMs and 7TMs, however the C-terminal is usually located in the intracellular environment due to the important activities it is typically involved in, particularly signaling transduction. Additionally, 40 of the NGIC proteins are predicted from the SignalP signal peptide prediction software to have signal peptides. In total there are 60 proteins that are predicted to have signal peptides. As current transmembrane protein prediction methods are based on classical helical structures, i.e. those that completely span the membrane, they do not take into account anomalies such as reentrant loops, short breaks in helices, and helices that lie along the surface of the membrane [8]. As mentioned previously with TMEM14C, this method limitation could possibly affect the topology prediction. For example, those 11 proteins that have predicted extracellular orientation (excluding the NGIC family) might be interesting proteins to study for transmembrane structural purposes. Examples of common structures and topologies for each functional class are represented in Fig. 8, which highlights conserved features and N- and C-terminus orientations found in the 4TM dataset. For example, claudins, tetraspanins, and connexins categorized in the Miscellaneous class are shown with the four conserved membrane regions, the intracellular location of the N- and C-termini, and also conserved cysteine residues that are often found in the extracellular loops and are involved in forming disulphide bonds.Fig. 8

Bottom Line: From a structural perspective, the α-helical transmembrane proteins can be categorized into major groups based on the number of transmembrane helices and these groups are often associated with specific functions.When compared to the well-characterized seven-transmembrane containing proteins (7TM), other TM groups are less explored and in particular the 4TM group.Moreover, we found an interesting exception to the ubiquitous intracellular N- and C-termini localization that is found throughout the entire membrane proteome and 4TM dataset in the neurotransmitter gated ion channel families.

View Article: PubMed Central - PubMed

Affiliation: Department of Neuroscience, Functional Pharmacology, Uppsala University, BMC, Box 593, 751 24, Uppsala, Sweden.

ABSTRACT

Background: Membrane proteins are key components in a large spectrum of diverse functions and thus account for the major proportion of the drug-targeted portion of the genome. From a structural perspective, the α-helical transmembrane proteins can be categorized into major groups based on the number of transmembrane helices and these groups are often associated with specific functions. When compared to the well-characterized seven-transmembrane containing proteins (7TM), other TM groups are less explored and in particular the 4TM group. In this study, we identify the complete 4TM complement from the latest release of the human genome and assess the 4TM structure group as a whole. We functionally characterize this dataset and evaluate the resulting groups and ubiquitous functions, and furthermore describe disease and drug target involvement.

Results: We classified 373 proteins, which represents ~7 % of the human membrane proteome, and includes 69 more proteins than our previous estimate. We have characterized the 4TM dataset based on functional, structural, and/or evolutionary similarities. Proteins that are involved in transport activity constitute 37 % of the dataset, 23 % are receptor-related, and 13 % have enzymatic functions. Intriguingly, proteins involved in transport are more than double the 15 % of transporters in the entire human membrane proteome, which might suggest that the 4TM topological architecture is more favored for transporting molecules over other functions. Moreover, we found an interesting exception to the ubiquitous intracellular N- and C-termini localization that is found throughout the entire membrane proteome and 4TM dataset in the neurotransmitter gated ion channel families. Overall, we estimate that 58 % of the dataset has a known association to disease conditions with 19 % of the genes possibly involved in different types of cancer.

Conclusions: We provide here the most robust and updated classification of the 4TM complement of the human genome as a platform to further understand the characteristics of 4TM functions and to explore pharmacological opportunities.

No MeSH data available.


Related in: MedlinePlus