Limits...
Conservation analysis of the CydX protein yields insights into small protein identification and evolution.

Allen RJ, Brenner EP, VanOrsdel CE, Hobson JJ, Hearn DJ, Hemm MR - BMC Genomics (2014)

Bottom Line: Further investigation of cydAB operons identified two additional conserved hypothetical small proteins: CydY encoded in CydAQlong operons that lack cydX, and CydZ encoded in more than 150 CydAQshort operons.These results elucidate the prevalence of CydX throughout the Proteobacteria, provide insight into the selection pressure and sequence requirements for CydX function, and suggest a potential functional interaction between the small protein and the CydA Q-loop, an enigmatic domain of the cytochrome bd oxidase complex.Finally, these results identify other conserved small proteins encoded in cytochrome bd oxidase operons, suggesting that small protein subunits may be a more common component of these enzymes than previously thought.

View Article: PubMed Central - PubMed

Affiliation: Department of Biological Sciences, Towson University, Towson 21252MD, USA. mhemm@towson.edu.

ABSTRACT

Background: The reliable identification of proteins containing 50 or fewer amino acids is difficult due to the limited information content in short sequences. The 37 amino acid CydX protein in Escherichia coli is a member of the cytochrome bd oxidase complex, an enzyme found throughout Eubacteria. To investigate the extent of CydX conservation and prevalence and evaluate different methods of small protein homologue identification, we surveyed 1095 Eubacteria species for the presence of the small protein.

Results: Over 300 homologues were identified, including 80 unannotated genes. The ability of both closely-related and divergent homologues to complement the E. coli ΔcydX mutant supports our identification techniques, and suggests that CydX homologues retain similar function among divergent species. However, sequence analysis of these proteins shows a great degree of variability, with only a few highly-conserved residues. An analysis of the co-variation between CydX homologues and their corresponding cydA and cydB genes shows a close synteny of the small protein with the CydA long Q-loop. Phylogenetic analysis suggests that the cydABX operon has undergone horizontal gene transfer, although the cydX gene likely evolved in a progenitor of the Alpha, Beta, and Gammaproteobacteria. Further investigation of cydAB operons identified two additional conserved hypothetical small proteins: CydY encoded in CydAQlong operons that lack cydX, and CydZ encoded in more than 150 CydAQshort operons.

Conclusions: This study provides a systematic analysis of bioinformatics techniques required for the unique challenges present in small protein identification and phylogenetic analyses. These results elucidate the prevalence of CydX throughout the Proteobacteria, provide insight into the selection pressure and sequence requirements for CydX function, and suggest a potential functional interaction between the small protein and the CydA Q-loop, an enigmatic domain of the cytochrome bd oxidase complex. Finally, these results identify other conserved small proteins encoded in cytochrome bd oxidase operons, suggesting that small protein subunits may be a more common component of these enzymes than previously thought.

Show MeSH

Related in: MedlinePlus

Synteny between thecydXgene and the long Q-loop allele ofcydA. (A) Alignment of the Q-loop region from select CydA homologues. Sequences are shaded in a gradient going from longest Q-loop (darkest) to shortest Q-loop (lightest). (B) Histogram showing the number of CydA homologues containing Q-loops of increasing size (black bars) and the number of CydA proteins encoded in an operon also containing cydX (grey bars). (C) Diagram of the CydA protein containing the Q-loop. Residues shown in black are those that are present only in long Q-loop CydA variants. (D) Diagram showing mutual information shared between residues in the CydX protein, shown in its predicted orientation in the inner membrane, and the Q-loop, shown as the residues spanning transmembrane regions 6 (TM6) and 7 (TM7) of CydA. Lines between residues show high mutual information between residues. The conserved and variable regions of the Q-loop are labeled. Spaces between residues in the Q-loop region represent residues that are missing because they either show no mutual information or share mutual information with other Q-loop residues and not with CydX. A mutual information filter cutoff of 10 was used for this figure. Species are as follows: Francisella philomiragia subsp. philomiragia ATCC 25017 (“Francisella”), Janthinobacterium sp. Marseille (“Janthinobacterium”), Burkholderia xenovorans LB400 (“Burkholderia”), Escherichia coli 536 (“Escherichia”), Brachybacterium faecium DSM 4810 (“Brachybacterium”), Mycobacterium marinum M (“Mycobacterium”), and Bacillus subtilis subsp. spizizenii str. W23 (“Bacillus”). Mutual information was determined using the program MISTIC. Alignments were generated using the program MUSCLE [57]. ‘*’ indicates that the residues are identical in all sequences and ‘:’ and ‘.’, respectively, indicated conserved and semi-conserved substitutions as defined by MUSCLE.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4325964&req=5

Fig9: Synteny between thecydXgene and the long Q-loop allele ofcydA. (A) Alignment of the Q-loop region from select CydA homologues. Sequences are shaded in a gradient going from longest Q-loop (darkest) to shortest Q-loop (lightest). (B) Histogram showing the number of CydA homologues containing Q-loops of increasing size (black bars) and the number of CydA proteins encoded in an operon also containing cydX (grey bars). (C) Diagram of the CydA protein containing the Q-loop. Residues shown in black are those that are present only in long Q-loop CydA variants. (D) Diagram showing mutual information shared between residues in the CydX protein, shown in its predicted orientation in the inner membrane, and the Q-loop, shown as the residues spanning transmembrane regions 6 (TM6) and 7 (TM7) of CydA. Lines between residues show high mutual information between residues. The conserved and variable regions of the Q-loop are labeled. Spaces between residues in the Q-loop region represent residues that are missing because they either show no mutual information or share mutual information with other Q-loop residues and not with CydX. A mutual information filter cutoff of 10 was used for this figure. Species are as follows: Francisella philomiragia subsp. philomiragia ATCC 25017 (“Francisella”), Janthinobacterium sp. Marseille (“Janthinobacterium”), Burkholderia xenovorans LB400 (“Burkholderia”), Escherichia coli 536 (“Escherichia”), Brachybacterium faecium DSM 4810 (“Brachybacterium”), Mycobacterium marinum M (“Mycobacterium”), and Bacillus subtilis subsp. spizizenii str. W23 (“Bacillus”). Mutual information was determined using the program MISTIC. Alignments were generated using the program MUSCLE [57]. ‘*’ indicates that the residues are identical in all sequences and ‘:’ and ‘.’, respectively, indicated conserved and semi-conserved substitutions as defined by MUSCLE.

Mentions: As an initial investigation into the potential for co-variation between CydX and the other two Cyd proteins, a phylogeny of each of the larger proteins was constructed and then compared to the distribution of CydX. Although this analysis showed some correlation between CydB protein sequence and the presence of CydX, a very strong correlation was observed between CydX and CydA. When the presence of CydX is overlaid on a phylogeny created using the CydA sequence, there is a tight grouping of all CydA proteins encoded in operons that also encode CydX (Figure 8). This result strongly suggests that CydA proteins encoded in cydABX operons are different at the amino acid level from those encoded within operons lacking the cydX gene. Analysis of alignments of those CydA proteins having or lacking cydX identified a consistent sequence difference in a loop between transmembrane regions 6 and 7 in CydA called the Q-loop (Figure 9A, C) [9]. A plot of the length of the Q-loop region of CydA homologues versus the presence of cydX in the operon shows a separation of Q-loops into two major clades, with the shorter loops being primarily 81–100 amino acids, and a group of longer Q-loops ranging from 149–220 amino acids (Figure 9B). The length of the Q-loop shows a significant correlation with the presence of cydX in the cydAB operon. 89% of CydAQlong alleles reside in operons that contain cydX, and 99% of cydX homologues are encoded in an operon containing a CydAQlong allele. This close association between CydAQlong and CydX suggests that these regions may be functionally related.Figure 8


Conservation analysis of the CydX protein yields insights into small protein identification and evolution.

Allen RJ, Brenner EP, VanOrsdel CE, Hobson JJ, Hearn DJ, Hemm MR - BMC Genomics (2014)

Synteny between thecydXgene and the long Q-loop allele ofcydA. (A) Alignment of the Q-loop region from select CydA homologues. Sequences are shaded in a gradient going from longest Q-loop (darkest) to shortest Q-loop (lightest). (B) Histogram showing the number of CydA homologues containing Q-loops of increasing size (black bars) and the number of CydA proteins encoded in an operon also containing cydX (grey bars). (C) Diagram of the CydA protein containing the Q-loop. Residues shown in black are those that are present only in long Q-loop CydA variants. (D) Diagram showing mutual information shared between residues in the CydX protein, shown in its predicted orientation in the inner membrane, and the Q-loop, shown as the residues spanning transmembrane regions 6 (TM6) and 7 (TM7) of CydA. Lines between residues show high mutual information between residues. The conserved and variable regions of the Q-loop are labeled. Spaces between residues in the Q-loop region represent residues that are missing because they either show no mutual information or share mutual information with other Q-loop residues and not with CydX. A mutual information filter cutoff of 10 was used for this figure. Species are as follows: Francisella philomiragia subsp. philomiragia ATCC 25017 (“Francisella”), Janthinobacterium sp. Marseille (“Janthinobacterium”), Burkholderia xenovorans LB400 (“Burkholderia”), Escherichia coli 536 (“Escherichia”), Brachybacterium faecium DSM 4810 (“Brachybacterium”), Mycobacterium marinum M (“Mycobacterium”), and Bacillus subtilis subsp. spizizenii str. W23 (“Bacillus”). Mutual information was determined using the program MISTIC. Alignments were generated using the program MUSCLE [57]. ‘*’ indicates that the residues are identical in all sequences and ‘:’ and ‘.’, respectively, indicated conserved and semi-conserved substitutions as defined by MUSCLE.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4325964&req=5

Fig9: Synteny between thecydXgene and the long Q-loop allele ofcydA. (A) Alignment of the Q-loop region from select CydA homologues. Sequences are shaded in a gradient going from longest Q-loop (darkest) to shortest Q-loop (lightest). (B) Histogram showing the number of CydA homologues containing Q-loops of increasing size (black bars) and the number of CydA proteins encoded in an operon also containing cydX (grey bars). (C) Diagram of the CydA protein containing the Q-loop. Residues shown in black are those that are present only in long Q-loop CydA variants. (D) Diagram showing mutual information shared between residues in the CydX protein, shown in its predicted orientation in the inner membrane, and the Q-loop, shown as the residues spanning transmembrane regions 6 (TM6) and 7 (TM7) of CydA. Lines between residues show high mutual information between residues. The conserved and variable regions of the Q-loop are labeled. Spaces between residues in the Q-loop region represent residues that are missing because they either show no mutual information or share mutual information with other Q-loop residues and not with CydX. A mutual information filter cutoff of 10 was used for this figure. Species are as follows: Francisella philomiragia subsp. philomiragia ATCC 25017 (“Francisella”), Janthinobacterium sp. Marseille (“Janthinobacterium”), Burkholderia xenovorans LB400 (“Burkholderia”), Escherichia coli 536 (“Escherichia”), Brachybacterium faecium DSM 4810 (“Brachybacterium”), Mycobacterium marinum M (“Mycobacterium”), and Bacillus subtilis subsp. spizizenii str. W23 (“Bacillus”). Mutual information was determined using the program MISTIC. Alignments were generated using the program MUSCLE [57]. ‘*’ indicates that the residues are identical in all sequences and ‘:’ and ‘.’, respectively, indicated conserved and semi-conserved substitutions as defined by MUSCLE.
Mentions: As an initial investigation into the potential for co-variation between CydX and the other two Cyd proteins, a phylogeny of each of the larger proteins was constructed and then compared to the distribution of CydX. Although this analysis showed some correlation between CydB protein sequence and the presence of CydX, a very strong correlation was observed between CydX and CydA. When the presence of CydX is overlaid on a phylogeny created using the CydA sequence, there is a tight grouping of all CydA proteins encoded in operons that also encode CydX (Figure 8). This result strongly suggests that CydA proteins encoded in cydABX operons are different at the amino acid level from those encoded within operons lacking the cydX gene. Analysis of alignments of those CydA proteins having or lacking cydX identified a consistent sequence difference in a loop between transmembrane regions 6 and 7 in CydA called the Q-loop (Figure 9A, C) [9]. A plot of the length of the Q-loop region of CydA homologues versus the presence of cydX in the operon shows a separation of Q-loops into two major clades, with the shorter loops being primarily 81–100 amino acids, and a group of longer Q-loops ranging from 149–220 amino acids (Figure 9B). The length of the Q-loop shows a significant correlation with the presence of cydX in the cydAB operon. 89% of CydAQlong alleles reside in operons that contain cydX, and 99% of cydX homologues are encoded in an operon containing a CydAQlong allele. This close association between CydAQlong and CydX suggests that these regions may be functionally related.Figure 8

Bottom Line: Further investigation of cydAB operons identified two additional conserved hypothetical small proteins: CydY encoded in CydAQlong operons that lack cydX, and CydZ encoded in more than 150 CydAQshort operons.These results elucidate the prevalence of CydX throughout the Proteobacteria, provide insight into the selection pressure and sequence requirements for CydX function, and suggest a potential functional interaction between the small protein and the CydA Q-loop, an enigmatic domain of the cytochrome bd oxidase complex.Finally, these results identify other conserved small proteins encoded in cytochrome bd oxidase operons, suggesting that small protein subunits may be a more common component of these enzymes than previously thought.

View Article: PubMed Central - PubMed

Affiliation: Department of Biological Sciences, Towson University, Towson 21252MD, USA. mhemm@towson.edu.

ABSTRACT

Background: The reliable identification of proteins containing 50 or fewer amino acids is difficult due to the limited information content in short sequences. The 37 amino acid CydX protein in Escherichia coli is a member of the cytochrome bd oxidase complex, an enzyme found throughout Eubacteria. To investigate the extent of CydX conservation and prevalence and evaluate different methods of small protein homologue identification, we surveyed 1095 Eubacteria species for the presence of the small protein.

Results: Over 300 homologues were identified, including 80 unannotated genes. The ability of both closely-related and divergent homologues to complement the E. coli ΔcydX mutant supports our identification techniques, and suggests that CydX homologues retain similar function among divergent species. However, sequence analysis of these proteins shows a great degree of variability, with only a few highly-conserved residues. An analysis of the co-variation between CydX homologues and their corresponding cydA and cydB genes shows a close synteny of the small protein with the CydA long Q-loop. Phylogenetic analysis suggests that the cydABX operon has undergone horizontal gene transfer, although the cydX gene likely evolved in a progenitor of the Alpha, Beta, and Gammaproteobacteria. Further investigation of cydAB operons identified two additional conserved hypothetical small proteins: CydY encoded in CydAQlong operons that lack cydX, and CydZ encoded in more than 150 CydAQshort operons.

Conclusions: This study provides a systematic analysis of bioinformatics techniques required for the unique challenges present in small protein identification and phylogenetic analyses. These results elucidate the prevalence of CydX throughout the Proteobacteria, provide insight into the selection pressure and sequence requirements for CydX function, and suggest a potential functional interaction between the small protein and the CydA Q-loop, an enigmatic domain of the cytochrome bd oxidase complex. Finally, these results identify other conserved small proteins encoded in cytochrome bd oxidase operons, suggesting that small protein subunits may be a more common component of these enzymes than previously thought.

Show MeSH
Related in: MedlinePlus