Limits...
Conservation analysis of the CydX protein yields insights into small protein identification and evolution.

Allen RJ, Brenner EP, VanOrsdel CE, Hobson JJ, Hearn DJ, Hemm MR - BMC Genomics (2014)

Bottom Line: Further investigation of cydAB operons identified two additional conserved hypothetical small proteins: CydY encoded in CydAQlong operons that lack cydX, and CydZ encoded in more than 150 CydAQshort operons.These results elucidate the prevalence of CydX throughout the Proteobacteria, provide insight into the selection pressure and sequence requirements for CydX function, and suggest a potential functional interaction between the small protein and the CydA Q-loop, an enigmatic domain of the cytochrome bd oxidase complex.Finally, these results identify other conserved small proteins encoded in cytochrome bd oxidase operons, suggesting that small protein subunits may be a more common component of these enzymes than previously thought.

View Article: PubMed Central - PubMed

Affiliation: Department of Biological Sciences, Towson University, Towson 21252MD, USA. mhemm@towson.edu.

ABSTRACT

Background: The reliable identification of proteins containing 50 or fewer amino acids is difficult due to the limited information content in short sequences. The 37 amino acid CydX protein in Escherichia coli is a member of the cytochrome bd oxidase complex, an enzyme found throughout Eubacteria. To investigate the extent of CydX conservation and prevalence and evaluate different methods of small protein homologue identification, we surveyed 1095 Eubacteria species for the presence of the small protein.

Results: Over 300 homologues were identified, including 80 unannotated genes. The ability of both closely-related and divergent homologues to complement the E. coli ΔcydX mutant supports our identification techniques, and suggests that CydX homologues retain similar function among divergent species. However, sequence analysis of these proteins shows a great degree of variability, with only a few highly-conserved residues. An analysis of the co-variation between CydX homologues and their corresponding cydA and cydB genes shows a close synteny of the small protein with the CydA long Q-loop. Phylogenetic analysis suggests that the cydABX operon has undergone horizontal gene transfer, although the cydX gene likely evolved in a progenitor of the Alpha, Beta, and Gammaproteobacteria. Further investigation of cydAB operons identified two additional conserved hypothetical small proteins: CydY encoded in CydAQlong operons that lack cydX, and CydZ encoded in more than 150 CydAQshort operons.

Conclusions: This study provides a systematic analysis of bioinformatics techniques required for the unique challenges present in small protein identification and phylogenetic analyses. These results elucidate the prevalence of CydX throughout the Proteobacteria, provide insight into the selection pressure and sequence requirements for CydX function, and suggest a potential functional interaction between the small protein and the CydA Q-loop, an enigmatic domain of the cytochrome bd oxidase complex. Finally, these results identify other conserved small proteins encoded in cytochrome bd oxidase operons, suggesting that small protein subunits may be a more common component of these enzymes than previously thought.

Show MeSH

Related in: MedlinePlus

Sequence analysis of the CydX protein family. (A) Consensus sequence of CydX homologues compared to the presence of predicted transmembrane domains (red bars) and the number of homologues that contain amino acids at each position (grey bars). The sequence logo was created using a MUSCLE alignment [57] analyzed by the WebLogo program [57]. Amino acids are colored based on their properties at physiological conditions as follows: black amino acids are hydrophobic, green residues are hydrophilic, blue residues are positively-charged and red residues are negatively-charged. Transmembrane domains were predicted using the program TMHMM [56]. (B) Predicted evolutionary importance of each residue in CydX. Analysis performed using the Lichtarge Computational Biology Lab’s Universal Evolutionary Trace web server [57]. (C) Predicted selection pressure on each amino acid in the CydX protein. Analysis performed using the Selecton program. (D) Residues within the CydX protein that share mutual information. Analysis performed using the MISTIC program. Residues are colored based on conservation, with the amino acids in red positions in the alignment being conserved and blue amino acids showing less conservation. (E) Alpha-helical wheel project of the predicted transmembrane domain of the E. coli CydX protein [28]. The conserved residues Y3, W6 and G9 are outlined in black. The shapes the amino acids are based on their properties at physiological conditions as follows: hydrophobic residues are diamonds and hydrophilic residues are circles. The degree of hydrophobicity of diamond residues is also reflected in the color, with green being most hydrophobic and yellow being least hydrophobic, and a range of color between those depending on predicted hydrophobicity. Likewise, the degree of hydrophilicity of circle residues is reflected in the color, with red being most hydrophilic and light orange being least, and a range of color between those depending on predicted hydrophilicity.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4325964&req=5

Fig4: Sequence analysis of the CydX protein family. (A) Consensus sequence of CydX homologues compared to the presence of predicted transmembrane domains (red bars) and the number of homologues that contain amino acids at each position (grey bars). The sequence logo was created using a MUSCLE alignment [57] analyzed by the WebLogo program [57]. Amino acids are colored based on their properties at physiological conditions as follows: black amino acids are hydrophobic, green residues are hydrophilic, blue residues are positively-charged and red residues are negatively-charged. Transmembrane domains were predicted using the program TMHMM [56]. (B) Predicted evolutionary importance of each residue in CydX. Analysis performed using the Lichtarge Computational Biology Lab’s Universal Evolutionary Trace web server [57]. (C) Predicted selection pressure on each amino acid in the CydX protein. Analysis performed using the Selecton program. (D) Residues within the CydX protein that share mutual information. Analysis performed using the MISTIC program. Residues are colored based on conservation, with the amino acids in red positions in the alignment being conserved and blue amino acids showing less conservation. (E) Alpha-helical wheel project of the predicted transmembrane domain of the E. coli CydX protein [28]. The conserved residues Y3, W6 and G9 are outlined in black. The shapes the amino acids are based on their properties at physiological conditions as follows: hydrophobic residues are diamonds and hydrophilic residues are circles. The degree of hydrophobicity of diamond residues is also reflected in the color, with green being most hydrophobic and yellow being least hydrophobic, and a range of color between those depending on predicted hydrophobicity. Likewise, the degree of hydrophilicity of circle residues is reflected in the color, with red being most hydrophilic and light orange being least, and a range of color between those depending on predicted hydrophilicity.

Mentions: The large number of CydX homologues identified in this study presented a unique opportunity to investigate amino acid sequence conservation in a widely-conserved small protein. A multiple sequence alignment (MSA) of 299 CydX homologues was used to create an amino acid sequence logo representing the relative conservation of residues in the protein (Figure 4A). The sequence logo shows that although much of the CydX protein is highly variable, there is a core region of higher homology containing a conserved tryptophan located at the N-terminal of a conserved hydrophobic α-helix (Figure 4A). Based on previous experimental results, the hydrophobic α-helix is thought to span the inner membrane [16, 26, 27], with the N-terminal and C-terminal of the protein on the cytosolic and periplasmic sides of the membrane, respectively [16]. For clarity, the E. coli sequence numbering will be used to demarcate the amino acids discussed here, with the N-terminal tryptophan being the sixth residue in the E. coli CydX protein. This conserved region contains other highly conserved residues including Y3 (all but one homologue), G9 (all but seven homologues) and E/D25 (either glutamate or aspartate in all but one homologue). Together, these residues make up the amino acid motif YxxWxxGx15E/D that spans 97% of the homologues identified in this study. Of these four amino acids, Y3, W6 and G9 are predicted to be contained in the transmembrane α-helix (Figure 4A). An α-helical projection of the CydX hydrophobic region [28] shows that these residues may be localized to the same side of the helix (Figure 4E), suggesting that this may be the side of the α-helix that interacts with the other proteins in the CydABX complex. Outside of the core region, conservation significantly decreases, caused in part by the high variability in length of the C-terminal end of the small proteins (Figure 4A).Figure 4


Conservation analysis of the CydX protein yields insights into small protein identification and evolution.

Allen RJ, Brenner EP, VanOrsdel CE, Hobson JJ, Hearn DJ, Hemm MR - BMC Genomics (2014)

Sequence analysis of the CydX protein family. (A) Consensus sequence of CydX homologues compared to the presence of predicted transmembrane domains (red bars) and the number of homologues that contain amino acids at each position (grey bars). The sequence logo was created using a MUSCLE alignment [57] analyzed by the WebLogo program [57]. Amino acids are colored based on their properties at physiological conditions as follows: black amino acids are hydrophobic, green residues are hydrophilic, blue residues are positively-charged and red residues are negatively-charged. Transmembrane domains were predicted using the program TMHMM [56]. (B) Predicted evolutionary importance of each residue in CydX. Analysis performed using the Lichtarge Computational Biology Lab’s Universal Evolutionary Trace web server [57]. (C) Predicted selection pressure on each amino acid in the CydX protein. Analysis performed using the Selecton program. (D) Residues within the CydX protein that share mutual information. Analysis performed using the MISTIC program. Residues are colored based on conservation, with the amino acids in red positions in the alignment being conserved and blue amino acids showing less conservation. (E) Alpha-helical wheel project of the predicted transmembrane domain of the E. coli CydX protein [28]. The conserved residues Y3, W6 and G9 are outlined in black. The shapes the amino acids are based on their properties at physiological conditions as follows: hydrophobic residues are diamonds and hydrophilic residues are circles. The degree of hydrophobicity of diamond residues is also reflected in the color, with green being most hydrophobic and yellow being least hydrophobic, and a range of color between those depending on predicted hydrophobicity. Likewise, the degree of hydrophilicity of circle residues is reflected in the color, with red being most hydrophilic and light orange being least, and a range of color between those depending on predicted hydrophilicity.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4325964&req=5

Fig4: Sequence analysis of the CydX protein family. (A) Consensus sequence of CydX homologues compared to the presence of predicted transmembrane domains (red bars) and the number of homologues that contain amino acids at each position (grey bars). The sequence logo was created using a MUSCLE alignment [57] analyzed by the WebLogo program [57]. Amino acids are colored based on their properties at physiological conditions as follows: black amino acids are hydrophobic, green residues are hydrophilic, blue residues are positively-charged and red residues are negatively-charged. Transmembrane domains were predicted using the program TMHMM [56]. (B) Predicted evolutionary importance of each residue in CydX. Analysis performed using the Lichtarge Computational Biology Lab’s Universal Evolutionary Trace web server [57]. (C) Predicted selection pressure on each amino acid in the CydX protein. Analysis performed using the Selecton program. (D) Residues within the CydX protein that share mutual information. Analysis performed using the MISTIC program. Residues are colored based on conservation, with the amino acids in red positions in the alignment being conserved and blue amino acids showing less conservation. (E) Alpha-helical wheel project of the predicted transmembrane domain of the E. coli CydX protein [28]. The conserved residues Y3, W6 and G9 are outlined in black. The shapes the amino acids are based on their properties at physiological conditions as follows: hydrophobic residues are diamonds and hydrophilic residues are circles. The degree of hydrophobicity of diamond residues is also reflected in the color, with green being most hydrophobic and yellow being least hydrophobic, and a range of color between those depending on predicted hydrophobicity. Likewise, the degree of hydrophilicity of circle residues is reflected in the color, with red being most hydrophilic and light orange being least, and a range of color between those depending on predicted hydrophilicity.
Mentions: The large number of CydX homologues identified in this study presented a unique opportunity to investigate amino acid sequence conservation in a widely-conserved small protein. A multiple sequence alignment (MSA) of 299 CydX homologues was used to create an amino acid sequence logo representing the relative conservation of residues in the protein (Figure 4A). The sequence logo shows that although much of the CydX protein is highly variable, there is a core region of higher homology containing a conserved tryptophan located at the N-terminal of a conserved hydrophobic α-helix (Figure 4A). Based on previous experimental results, the hydrophobic α-helix is thought to span the inner membrane [16, 26, 27], with the N-terminal and C-terminal of the protein on the cytosolic and periplasmic sides of the membrane, respectively [16]. For clarity, the E. coli sequence numbering will be used to demarcate the amino acids discussed here, with the N-terminal tryptophan being the sixth residue in the E. coli CydX protein. This conserved region contains other highly conserved residues including Y3 (all but one homologue), G9 (all but seven homologues) and E/D25 (either glutamate or aspartate in all but one homologue). Together, these residues make up the amino acid motif YxxWxxGx15E/D that spans 97% of the homologues identified in this study. Of these four amino acids, Y3, W6 and G9 are predicted to be contained in the transmembrane α-helix (Figure 4A). An α-helical projection of the CydX hydrophobic region [28] shows that these residues may be localized to the same side of the helix (Figure 4E), suggesting that this may be the side of the α-helix that interacts with the other proteins in the CydABX complex. Outside of the core region, conservation significantly decreases, caused in part by the high variability in length of the C-terminal end of the small proteins (Figure 4A).Figure 4

Bottom Line: Further investigation of cydAB operons identified two additional conserved hypothetical small proteins: CydY encoded in CydAQlong operons that lack cydX, and CydZ encoded in more than 150 CydAQshort operons.These results elucidate the prevalence of CydX throughout the Proteobacteria, provide insight into the selection pressure and sequence requirements for CydX function, and suggest a potential functional interaction between the small protein and the CydA Q-loop, an enigmatic domain of the cytochrome bd oxidase complex.Finally, these results identify other conserved small proteins encoded in cytochrome bd oxidase operons, suggesting that small protein subunits may be a more common component of these enzymes than previously thought.

View Article: PubMed Central - PubMed

Affiliation: Department of Biological Sciences, Towson University, Towson 21252MD, USA. mhemm@towson.edu.

ABSTRACT

Background: The reliable identification of proteins containing 50 or fewer amino acids is difficult due to the limited information content in short sequences. The 37 amino acid CydX protein in Escherichia coli is a member of the cytochrome bd oxidase complex, an enzyme found throughout Eubacteria. To investigate the extent of CydX conservation and prevalence and evaluate different methods of small protein homologue identification, we surveyed 1095 Eubacteria species for the presence of the small protein.

Results: Over 300 homologues were identified, including 80 unannotated genes. The ability of both closely-related and divergent homologues to complement the E. coli ΔcydX mutant supports our identification techniques, and suggests that CydX homologues retain similar function among divergent species. However, sequence analysis of these proteins shows a great degree of variability, with only a few highly-conserved residues. An analysis of the co-variation between CydX homologues and their corresponding cydA and cydB genes shows a close synteny of the small protein with the CydA long Q-loop. Phylogenetic analysis suggests that the cydABX operon has undergone horizontal gene transfer, although the cydX gene likely evolved in a progenitor of the Alpha, Beta, and Gammaproteobacteria. Further investigation of cydAB operons identified two additional conserved hypothetical small proteins: CydY encoded in CydAQlong operons that lack cydX, and CydZ encoded in more than 150 CydAQshort operons.

Conclusions: This study provides a systematic analysis of bioinformatics techniques required for the unique challenges present in small protein identification and phylogenetic analyses. These results elucidate the prevalence of CydX throughout the Proteobacteria, provide insight into the selection pressure and sequence requirements for CydX function, and suggest a potential functional interaction between the small protein and the CydA Q-loop, an enigmatic domain of the cytochrome bd oxidase complex. Finally, these results identify other conserved small proteins encoded in cytochrome bd oxidase operons, suggesting that small protein subunits may be a more common component of these enzymes than previously thought.

Show MeSH
Related in: MedlinePlus