Limits...
WXG100 protein superfamily consists of three subfamilies and exhibits an α-helical C-terminal conserved residue pattern.

Poulsen C, Panjikar S, Holton SJ, Wilmanns M, Song YH - PLoS ONE (2014)

Bottom Line: The side chains of these conserved residues decorate the same side of the C-terminal α-helix and therefore form a distinct surface.Our results lead to a putatively extended T7SS secretion signal which combines two reported T7SS recognition characteristics: Firstly that the T7SS secretion signal is localized at the C-terminus of T7SS substrates and secondly that the conserved residues YxxxD/E are essential for T7SS activity.Furthermore, we propose that the specific α-helical surface formed by the conserved sequence pattern including YxxxD/E motif is a key component of T7SS-substrate recognition.

View Article: PubMed Central - PubMed

Affiliation: EMBL-Hamburg, Hamburg, Germany.

ABSTRACT
Members of the WXG100 protein superfamily form homo- or heterodimeric complexes. The most studied proteins among them are the secreted T-cell antigens CFP-10 (10 kDa culture filtrate protein, EsxB) and ESAT-6 (6 kDa early secreted antigen target, EsxA) from Mycobacterium tuberculosis. They are encoded on an operon within a gene cluster, named as ESX-1, that encodes for the Type VII secretion system (T7SS). WXG100 proteins are secreted in a full-length form and it is known that they adopt a four-helix bundle structure. In the current work we discuss the evolutionary relationship between the homo- and heterodimeric WXG100 proteins, the basis of the oligomeric state and the key structural features of the conserved sequence pattern of WXG100 proteins. We performed an iterative bioinformatics analysis of the WXG100 protein superfamily and correlated this with the atomic structures of the representative WXG100 proteins. We find, firstly, that the WXG100 protein superfamily consists of three subfamilies: CFP-10-, ESAT-6- and sagEsxA-like proteins (EsxA proteins similar to that of Streptococcus agalactiae). Secondly, that the heterodimeric complexes probably evolved from a homodimeric precursor. Thirdly, that the genes of hetero-dimeric WXG100 proteins are always encoded in bi-cistronic operons and finally, by combining the sequence alignments with the X-ray data we identify a conserved C-terminal sequence pattern. The side chains of these conserved residues decorate the same side of the C-terminal α-helix and therefore form a distinct surface. Our results lead to a putatively extended T7SS secretion signal which combines two reported T7SS recognition characteristics: Firstly that the T7SS secretion signal is localized at the C-terminus of T7SS substrates and secondly that the conserved residues YxxxD/E are essential for T7SS activity. Furthermore, we propose that the specific α-helical surface formed by the conserved sequence pattern including YxxxD/E motif is a key component of T7SS-substrate recognition.

Show MeSH

Related in: MedlinePlus

Steps involved in the ‘context-specific’ bioinformatics study.The chart is organized in the consecutive major steps labelled as 1 to 7, and it contains four columns; the first column shows the number of protein sequences before and the last column that of after the execution of each step (No seq INPUT and No seq OUTPUT), respectively. The second column shows the description of the steps, the third column the references to the steps, respectively. For details see ‘Materials and Methods.’ To carry out these steps, we have written a few Python-routines for the steps 1 through 3 and employed several open access programs (steps in light grey).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3935865&req=5

pone-0089313-g001: Steps involved in the ‘context-specific’ bioinformatics study.The chart is organized in the consecutive major steps labelled as 1 to 7, and it contains four columns; the first column shows the number of protein sequences before and the last column that of after the execution of each step (No seq INPUT and No seq OUTPUT), respectively. The second column shows the description of the steps, the third column the references to the steps, respectively. For details see ‘Materials and Methods.’ To carry out these steps, we have written a few Python-routines for the steps 1 through 3 and employed several open access programs (steps in light grey).

Mentions: Proteins belonging to the WXG-100 family share less than 15% sequence identity with each other [6], which makes it extremely difficult to perform a meaningful alignment of protein sequences of this superfamily. To achieve a comprehensive sequence analysis and to link conserved residues to the structural data, we performed an iterative bio-informatics analysis. We combined all the available specific features known for this protein family and used a wide range of bio-informatics tools, with the results monitored in a step-wise manner (Fig. 1). It is worth noting that the collection of the available prokaryotic genome sequences, a priori, are somewhat biased, due to specific selection criteria such as bacteria habitats or cultivation properties and because the vast majority of bacteria have not yet been sequenced [17]. The first step in the sequence analysis was to collect a set of non-redundant WXG100 proteins. In the first step, an exhaustive search for WXG100 ORFs (Open Reading Frames) was performed using 940 fully sequenced prokaryotic genomes corresponding to ∼6 million ORFs of all phyla (Fig. 1). From this search we identified 2424 potential hits, which was reduced to 527 targets when the threshold for the predicted α-helical content was set to 40%. The genetic context of these targets was explored in the next step. When the occurrence of a bi-cistronic operon was taken into account and for those tandem genes containing the less stringent motif [W-H-L-F]-X-G, a further 153 putative WXG100 proteins were identified to give a total of 680 putative protein members for the WXG100 superfamily. All of the 22 known WXG100 proteins from M. tuberculosis (tbWXG100) were found to be among these 680 targets. To ensure that only truly homologous proteins were identified we used the classification tool CLANS [18]. The CLANS analysis resulted in a major cluster that contained a total of 183 proteins including all 22 WXG100 proteins from M. tuberculosis H37Rv (Fig. S2). We examined all the target proteins and could exclude a few following characterization by gene ontology (Figs. 1 and S2). An estimated phylogenetic tree was then calculated using the program MrBayes [19], including the 141 most diverse sequences out of 183 proteins (HHfilter was used for selection [20]). The resulting tree allowed us to understand the genetic relationship between the different WXG100 homologues (Fig. 2). As a result of this analysis we found that the targets originate almost exclusively from just two phyla, Actinobacteria and Firmicutes, with a limited number of targets originating from the phylum Chloroflexi (for further information see Materials & Methods and Fig. S2).


WXG100 protein superfamily consists of three subfamilies and exhibits an α-helical C-terminal conserved residue pattern.

Poulsen C, Panjikar S, Holton SJ, Wilmanns M, Song YH - PLoS ONE (2014)

Steps involved in the ‘context-specific’ bioinformatics study.The chart is organized in the consecutive major steps labelled as 1 to 7, and it contains four columns; the first column shows the number of protein sequences before and the last column that of after the execution of each step (No seq INPUT and No seq OUTPUT), respectively. The second column shows the description of the steps, the third column the references to the steps, respectively. For details see ‘Materials and Methods.’ To carry out these steps, we have written a few Python-routines for the steps 1 through 3 and employed several open access programs (steps in light grey).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3935865&req=5

pone-0089313-g001: Steps involved in the ‘context-specific’ bioinformatics study.The chart is organized in the consecutive major steps labelled as 1 to 7, and it contains four columns; the first column shows the number of protein sequences before and the last column that of after the execution of each step (No seq INPUT and No seq OUTPUT), respectively. The second column shows the description of the steps, the third column the references to the steps, respectively. For details see ‘Materials and Methods.’ To carry out these steps, we have written a few Python-routines for the steps 1 through 3 and employed several open access programs (steps in light grey).
Mentions: Proteins belonging to the WXG-100 family share less than 15% sequence identity with each other [6], which makes it extremely difficult to perform a meaningful alignment of protein sequences of this superfamily. To achieve a comprehensive sequence analysis and to link conserved residues to the structural data, we performed an iterative bio-informatics analysis. We combined all the available specific features known for this protein family and used a wide range of bio-informatics tools, with the results monitored in a step-wise manner (Fig. 1). It is worth noting that the collection of the available prokaryotic genome sequences, a priori, are somewhat biased, due to specific selection criteria such as bacteria habitats or cultivation properties and because the vast majority of bacteria have not yet been sequenced [17]. The first step in the sequence analysis was to collect a set of non-redundant WXG100 proteins. In the first step, an exhaustive search for WXG100 ORFs (Open Reading Frames) was performed using 940 fully sequenced prokaryotic genomes corresponding to ∼6 million ORFs of all phyla (Fig. 1). From this search we identified 2424 potential hits, which was reduced to 527 targets when the threshold for the predicted α-helical content was set to 40%. The genetic context of these targets was explored in the next step. When the occurrence of a bi-cistronic operon was taken into account and for those tandem genes containing the less stringent motif [W-H-L-F]-X-G, a further 153 putative WXG100 proteins were identified to give a total of 680 putative protein members for the WXG100 superfamily. All of the 22 known WXG100 proteins from M. tuberculosis (tbWXG100) were found to be among these 680 targets. To ensure that only truly homologous proteins were identified we used the classification tool CLANS [18]. The CLANS analysis resulted in a major cluster that contained a total of 183 proteins including all 22 WXG100 proteins from M. tuberculosis H37Rv (Fig. S2). We examined all the target proteins and could exclude a few following characterization by gene ontology (Figs. 1 and S2). An estimated phylogenetic tree was then calculated using the program MrBayes [19], including the 141 most diverse sequences out of 183 proteins (HHfilter was used for selection [20]). The resulting tree allowed us to understand the genetic relationship between the different WXG100 homologues (Fig. 2). As a result of this analysis we found that the targets originate almost exclusively from just two phyla, Actinobacteria and Firmicutes, with a limited number of targets originating from the phylum Chloroflexi (for further information see Materials & Methods and Fig. S2).

Bottom Line: The side chains of these conserved residues decorate the same side of the C-terminal α-helix and therefore form a distinct surface.Our results lead to a putatively extended T7SS secretion signal which combines two reported T7SS recognition characteristics: Firstly that the T7SS secretion signal is localized at the C-terminus of T7SS substrates and secondly that the conserved residues YxxxD/E are essential for T7SS activity.Furthermore, we propose that the specific α-helical surface formed by the conserved sequence pattern including YxxxD/E motif is a key component of T7SS-substrate recognition.

View Article: PubMed Central - PubMed

Affiliation: EMBL-Hamburg, Hamburg, Germany.

ABSTRACT
Members of the WXG100 protein superfamily form homo- or heterodimeric complexes. The most studied proteins among them are the secreted T-cell antigens CFP-10 (10 kDa culture filtrate protein, EsxB) and ESAT-6 (6 kDa early secreted antigen target, EsxA) from Mycobacterium tuberculosis. They are encoded on an operon within a gene cluster, named as ESX-1, that encodes for the Type VII secretion system (T7SS). WXG100 proteins are secreted in a full-length form and it is known that they adopt a four-helix bundle structure. In the current work we discuss the evolutionary relationship between the homo- and heterodimeric WXG100 proteins, the basis of the oligomeric state and the key structural features of the conserved sequence pattern of WXG100 proteins. We performed an iterative bioinformatics analysis of the WXG100 protein superfamily and correlated this with the atomic structures of the representative WXG100 proteins. We find, firstly, that the WXG100 protein superfamily consists of three subfamilies: CFP-10-, ESAT-6- and sagEsxA-like proteins (EsxA proteins similar to that of Streptococcus agalactiae). Secondly, that the heterodimeric complexes probably evolved from a homodimeric precursor. Thirdly, that the genes of hetero-dimeric WXG100 proteins are always encoded in bi-cistronic operons and finally, by combining the sequence alignments with the X-ray data we identify a conserved C-terminal sequence pattern. The side chains of these conserved residues decorate the same side of the C-terminal α-helix and therefore form a distinct surface. Our results lead to a putatively extended T7SS secretion signal which combines two reported T7SS recognition characteristics: Firstly that the T7SS secretion signal is localized at the C-terminus of T7SS substrates and secondly that the conserved residues YxxxD/E are essential for T7SS activity. Furthermore, we propose that the specific α-helical surface formed by the conserved sequence pattern including YxxxD/E motif is a key component of T7SS-substrate recognition.

Show MeSH
Related in: MedlinePlus