Limits...
WXG100 protein superfamily consists of three subfamilies and exhibits an α-helical C-terminal conserved residue pattern.

Poulsen C, Panjikar S, Holton SJ, Wilmanns M, Song YH - PLoS ONE (2014)

Bottom Line: The side chains of these conserved residues decorate the same side of the C-terminal α-helix and therefore form a distinct surface.Our results lead to a putatively extended T7SS secretion signal which combines two reported T7SS recognition characteristics: Firstly that the T7SS secretion signal is localized at the C-terminus of T7SS substrates and secondly that the conserved residues YxxxD/E are essential for T7SS activity.Furthermore, we propose that the specific α-helical surface formed by the conserved sequence pattern including YxxxD/E motif is a key component of T7SS-substrate recognition.

View Article: PubMed Central - PubMed

Affiliation: EMBL-Hamburg, Hamburg, Germany.

ABSTRACT
Members of the WXG100 protein superfamily form homo- or heterodimeric complexes. The most studied proteins among them are the secreted T-cell antigens CFP-10 (10 kDa culture filtrate protein, EsxB) and ESAT-6 (6 kDa early secreted antigen target, EsxA) from Mycobacterium tuberculosis. They are encoded on an operon within a gene cluster, named as ESX-1, that encodes for the Type VII secretion system (T7SS). WXG100 proteins are secreted in a full-length form and it is known that they adopt a four-helix bundle structure. In the current work we discuss the evolutionary relationship between the homo- and heterodimeric WXG100 proteins, the basis of the oligomeric state and the key structural features of the conserved sequence pattern of WXG100 proteins. We performed an iterative bioinformatics analysis of the WXG100 protein superfamily and correlated this with the atomic structures of the representative WXG100 proteins. We find, firstly, that the WXG100 protein superfamily consists of three subfamilies: CFP-10-, ESAT-6- and sagEsxA-like proteins (EsxA proteins similar to that of Streptococcus agalactiae). Secondly, that the heterodimeric complexes probably evolved from a homodimeric precursor. Thirdly, that the genes of hetero-dimeric WXG100 proteins are always encoded in bi-cistronic operons and finally, by combining the sequence alignments with the X-ray data we identify a conserved C-terminal sequence pattern. The side chains of these conserved residues decorate the same side of the C-terminal α-helix and therefore form a distinct surface. Our results lead to a putatively extended T7SS secretion signal which combines two reported T7SS recognition characteristics: Firstly that the T7SS secretion signal is localized at the C-terminus of T7SS substrates and secondly that the conserved residues YxxxD/E are essential for T7SS activity. Furthermore, we propose that the specific α-helical surface formed by the conserved sequence pattern including YxxxD/E motif is a key component of T7SS-substrate recognition.

Show MeSH

Related in: MedlinePlus

Estimated phylogenetic tree of the WXG-100 protein family consisting of three WXG100 subfamilies.The tree of WXG100 proteins was constructed in midpoint rooted presentation with three main clades: CFP-10-like (blue circular arc), ESAT-6-like (cyan circular arc) proteins and sagEsxA-like proteins (orange circular arc). The WXG100 gene pairs of M. tuberculosis occurring within the RD1-like gene clusters denoted as the regions (Esx) 1 to 5 are coloured accordingly along with the Rv-annotations (see subtitles). The annotations of the genes in close proximity to each of the WXG100 genes were manually analyzed and this information was also included to the tree. Two WXG100 genes with an intergenic distance of less than 80 nucleotides (according to the definition Roback et al. [47]) are considered to be encoded within a bi-cistronic operon (filled black squares on the circle layer 3), whilst mono-cistronic WXG100 genes are indicated by an unfilled squares. Those WXG-proteins whose oligomeric properties have been experimentally determined are marked with a triangle for homodimers and with pairs of blue dots for heterodimers. The second inner arcs show the phyla of the bacteria.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3935865&req=5

pone-0089313-g002: Estimated phylogenetic tree of the WXG-100 protein family consisting of three WXG100 subfamilies.The tree of WXG100 proteins was constructed in midpoint rooted presentation with three main clades: CFP-10-like (blue circular arc), ESAT-6-like (cyan circular arc) proteins and sagEsxA-like proteins (orange circular arc). The WXG100 gene pairs of M. tuberculosis occurring within the RD1-like gene clusters denoted as the regions (Esx) 1 to 5 are coloured accordingly along with the Rv-annotations (see subtitles). The annotations of the genes in close proximity to each of the WXG100 genes were manually analyzed and this information was also included to the tree. Two WXG100 genes with an intergenic distance of less than 80 nucleotides (according to the definition Roback et al. [47]) are considered to be encoded within a bi-cistronic operon (filled black squares on the circle layer 3), whilst mono-cistronic WXG100 genes are indicated by an unfilled squares. Those WXG-proteins whose oligomeric properties have been experimentally determined are marked with a triangle for homodimers and with pairs of blue dots for heterodimers. The second inner arcs show the phyla of the bacteria.

Mentions: Proteins belonging to the WXG-100 family share less than 15% sequence identity with each other [6], which makes it extremely difficult to perform a meaningful alignment of protein sequences of this superfamily. To achieve a comprehensive sequence analysis and to link conserved residues to the structural data, we performed an iterative bio-informatics analysis. We combined all the available specific features known for this protein family and used a wide range of bio-informatics tools, with the results monitored in a step-wise manner (Fig. 1). It is worth noting that the collection of the available prokaryotic genome sequences, a priori, are somewhat biased, due to specific selection criteria such as bacteria habitats or cultivation properties and because the vast majority of bacteria have not yet been sequenced [17]. The first step in the sequence analysis was to collect a set of non-redundant WXG100 proteins. In the first step, an exhaustive search for WXG100 ORFs (Open Reading Frames) was performed using 940 fully sequenced prokaryotic genomes corresponding to ∼6 million ORFs of all phyla (Fig. 1). From this search we identified 2424 potential hits, which was reduced to 527 targets when the threshold for the predicted α-helical content was set to 40%. The genetic context of these targets was explored in the next step. When the occurrence of a bi-cistronic operon was taken into account and for those tandem genes containing the less stringent motif [W-H-L-F]-X-G, a further 153 putative WXG100 proteins were identified to give a total of 680 putative protein members for the WXG100 superfamily. All of the 22 known WXG100 proteins from M. tuberculosis (tbWXG100) were found to be among these 680 targets. To ensure that only truly homologous proteins were identified we used the classification tool CLANS [18]. The CLANS analysis resulted in a major cluster that contained a total of 183 proteins including all 22 WXG100 proteins from M. tuberculosis H37Rv (Fig. S2). We examined all the target proteins and could exclude a few following characterization by gene ontology (Figs. 1 and S2). An estimated phylogenetic tree was then calculated using the program MrBayes [19], including the 141 most diverse sequences out of 183 proteins (HHfilter was used for selection [20]). The resulting tree allowed us to understand the genetic relationship between the different WXG100 homologues (Fig. 2). As a result of this analysis we found that the targets originate almost exclusively from just two phyla, Actinobacteria and Firmicutes, with a limited number of targets originating from the phylum Chloroflexi (for further information see Materials & Methods and Fig. S2).


WXG100 protein superfamily consists of three subfamilies and exhibits an α-helical C-terminal conserved residue pattern.

Poulsen C, Panjikar S, Holton SJ, Wilmanns M, Song YH - PLoS ONE (2014)

Estimated phylogenetic tree of the WXG-100 protein family consisting of three WXG100 subfamilies.The tree of WXG100 proteins was constructed in midpoint rooted presentation with three main clades: CFP-10-like (blue circular arc), ESAT-6-like (cyan circular arc) proteins and sagEsxA-like proteins (orange circular arc). The WXG100 gene pairs of M. tuberculosis occurring within the RD1-like gene clusters denoted as the regions (Esx) 1 to 5 are coloured accordingly along with the Rv-annotations (see subtitles). The annotations of the genes in close proximity to each of the WXG100 genes were manually analyzed and this information was also included to the tree. Two WXG100 genes with an intergenic distance of less than 80 nucleotides (according to the definition Roback et al. [47]) are considered to be encoded within a bi-cistronic operon (filled black squares on the circle layer 3), whilst mono-cistronic WXG100 genes are indicated by an unfilled squares. Those WXG-proteins whose oligomeric properties have been experimentally determined are marked with a triangle for homodimers and with pairs of blue dots for heterodimers. The second inner arcs show the phyla of the bacteria.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3935865&req=5

pone-0089313-g002: Estimated phylogenetic tree of the WXG-100 protein family consisting of three WXG100 subfamilies.The tree of WXG100 proteins was constructed in midpoint rooted presentation with three main clades: CFP-10-like (blue circular arc), ESAT-6-like (cyan circular arc) proteins and sagEsxA-like proteins (orange circular arc). The WXG100 gene pairs of M. tuberculosis occurring within the RD1-like gene clusters denoted as the regions (Esx) 1 to 5 are coloured accordingly along with the Rv-annotations (see subtitles). The annotations of the genes in close proximity to each of the WXG100 genes were manually analyzed and this information was also included to the tree. Two WXG100 genes with an intergenic distance of less than 80 nucleotides (according to the definition Roback et al. [47]) are considered to be encoded within a bi-cistronic operon (filled black squares on the circle layer 3), whilst mono-cistronic WXG100 genes are indicated by an unfilled squares. Those WXG-proteins whose oligomeric properties have been experimentally determined are marked with a triangle for homodimers and with pairs of blue dots for heterodimers. The second inner arcs show the phyla of the bacteria.
Mentions: Proteins belonging to the WXG-100 family share less than 15% sequence identity with each other [6], which makes it extremely difficult to perform a meaningful alignment of protein sequences of this superfamily. To achieve a comprehensive sequence analysis and to link conserved residues to the structural data, we performed an iterative bio-informatics analysis. We combined all the available specific features known for this protein family and used a wide range of bio-informatics tools, with the results monitored in a step-wise manner (Fig. 1). It is worth noting that the collection of the available prokaryotic genome sequences, a priori, are somewhat biased, due to specific selection criteria such as bacteria habitats or cultivation properties and because the vast majority of bacteria have not yet been sequenced [17]. The first step in the sequence analysis was to collect a set of non-redundant WXG100 proteins. In the first step, an exhaustive search for WXG100 ORFs (Open Reading Frames) was performed using 940 fully sequenced prokaryotic genomes corresponding to ∼6 million ORFs of all phyla (Fig. 1). From this search we identified 2424 potential hits, which was reduced to 527 targets when the threshold for the predicted α-helical content was set to 40%. The genetic context of these targets was explored in the next step. When the occurrence of a bi-cistronic operon was taken into account and for those tandem genes containing the less stringent motif [W-H-L-F]-X-G, a further 153 putative WXG100 proteins were identified to give a total of 680 putative protein members for the WXG100 superfamily. All of the 22 known WXG100 proteins from M. tuberculosis (tbWXG100) were found to be among these 680 targets. To ensure that only truly homologous proteins were identified we used the classification tool CLANS [18]. The CLANS analysis resulted in a major cluster that contained a total of 183 proteins including all 22 WXG100 proteins from M. tuberculosis H37Rv (Fig. S2). We examined all the target proteins and could exclude a few following characterization by gene ontology (Figs. 1 and S2). An estimated phylogenetic tree was then calculated using the program MrBayes [19], including the 141 most diverse sequences out of 183 proteins (HHfilter was used for selection [20]). The resulting tree allowed us to understand the genetic relationship between the different WXG100 homologues (Fig. 2). As a result of this analysis we found that the targets originate almost exclusively from just two phyla, Actinobacteria and Firmicutes, with a limited number of targets originating from the phylum Chloroflexi (for further information see Materials & Methods and Fig. S2).

Bottom Line: The side chains of these conserved residues decorate the same side of the C-terminal α-helix and therefore form a distinct surface.Our results lead to a putatively extended T7SS secretion signal which combines two reported T7SS recognition characteristics: Firstly that the T7SS secretion signal is localized at the C-terminus of T7SS substrates and secondly that the conserved residues YxxxD/E are essential for T7SS activity.Furthermore, we propose that the specific α-helical surface formed by the conserved sequence pattern including YxxxD/E motif is a key component of T7SS-substrate recognition.

View Article: PubMed Central - PubMed

Affiliation: EMBL-Hamburg, Hamburg, Germany.

ABSTRACT
Members of the WXG100 protein superfamily form homo- or heterodimeric complexes. The most studied proteins among them are the secreted T-cell antigens CFP-10 (10 kDa culture filtrate protein, EsxB) and ESAT-6 (6 kDa early secreted antigen target, EsxA) from Mycobacterium tuberculosis. They are encoded on an operon within a gene cluster, named as ESX-1, that encodes for the Type VII secretion system (T7SS). WXG100 proteins are secreted in a full-length form and it is known that they adopt a four-helix bundle structure. In the current work we discuss the evolutionary relationship between the homo- and heterodimeric WXG100 proteins, the basis of the oligomeric state and the key structural features of the conserved sequence pattern of WXG100 proteins. We performed an iterative bioinformatics analysis of the WXG100 protein superfamily and correlated this with the atomic structures of the representative WXG100 proteins. We find, firstly, that the WXG100 protein superfamily consists of three subfamilies: CFP-10-, ESAT-6- and sagEsxA-like proteins (EsxA proteins similar to that of Streptococcus agalactiae). Secondly, that the heterodimeric complexes probably evolved from a homodimeric precursor. Thirdly, that the genes of hetero-dimeric WXG100 proteins are always encoded in bi-cistronic operons and finally, by combining the sequence alignments with the X-ray data we identify a conserved C-terminal sequence pattern. The side chains of these conserved residues decorate the same side of the C-terminal α-helix and therefore form a distinct surface. Our results lead to a putatively extended T7SS secretion signal which combines two reported T7SS recognition characteristics: Firstly that the T7SS secretion signal is localized at the C-terminus of T7SS substrates and secondly that the conserved residues YxxxD/E are essential for T7SS activity. Furthermore, we propose that the specific α-helical surface formed by the conserved sequence pattern including YxxxD/E motif is a key component of T7SS-substrate recognition.

Show MeSH
Related in: MedlinePlus