Limits...
Conservation analysis of the CydX protein yields insights into small protein identification and evolution.

Allen RJ, Brenner EP, VanOrsdel CE, Hobson JJ, Hearn DJ, Hemm MR - BMC Genomics (2014)

Bottom Line: Further investigation of cydAB operons identified two additional conserved hypothetical small proteins: CydY encoded in CydAQlong operons that lack cydX, and CydZ encoded in more than 150 CydAQshort operons.These results elucidate the prevalence of CydX throughout the Proteobacteria, provide insight into the selection pressure and sequence requirements for CydX function, and suggest a potential functional interaction between the small protein and the CydA Q-loop, an enigmatic domain of the cytochrome bd oxidase complex.Finally, these results identify other conserved small proteins encoded in cytochrome bd oxidase operons, suggesting that small protein subunits may be a more common component of these enzymes than previously thought.

View Article: PubMed Central - PubMed

Affiliation: Department of Biological Sciences, Towson University, Towson 21252MD, USA. mhemm@towson.edu.

ABSTRACT

Background: The reliable identification of proteins containing 50 or fewer amino acids is difficult due to the limited information content in short sequences. The 37 amino acid CydX protein in Escherichia coli is a member of the cytochrome bd oxidase complex, an enzyme found throughout Eubacteria. To investigate the extent of CydX conservation and prevalence and evaluate different methods of small protein homologue identification, we surveyed 1095 Eubacteria species for the presence of the small protein.

Results: Over 300 homologues were identified, including 80 unannotated genes. The ability of both closely-related and divergent homologues to complement the E. coli ΔcydX mutant supports our identification techniques, and suggests that CydX homologues retain similar function among divergent species. However, sequence analysis of these proteins shows a great degree of variability, with only a few highly-conserved residues. An analysis of the co-variation between CydX homologues and their corresponding cydA and cydB genes shows a close synteny of the small protein with the CydA long Q-loop. Phylogenetic analysis suggests that the cydABX operon has undergone horizontal gene transfer, although the cydX gene likely evolved in a progenitor of the Alpha, Beta, and Gammaproteobacteria. Further investigation of cydAB operons identified two additional conserved hypothetical small proteins: CydY encoded in CydAQlong operons that lack cydX, and CydZ encoded in more than 150 CydAQshort operons.

Conclusions: This study provides a systematic analysis of bioinformatics techniques required for the unique challenges present in small protein identification and phylogenetic analyses. These results elucidate the prevalence of CydX throughout the Proteobacteria, provide insight into the selection pressure and sequence requirements for CydX function, and suggest a potential functional interaction between the small protein and the CydA Q-loop, an enigmatic domain of the cytochrome bd oxidase complex. Finally, these results identify other conserved small proteins encoded in cytochrome bd oxidase operons, suggesting that small protein subunits may be a more common component of these enzymes than previously thought.

Show MeSH

Related in: MedlinePlus

Evaluating methods for accurately identifying CydX homologues in 1121 species of bacteria. (A) Venn diagram of the number of CydX homologues identified by an HMM-based method (“HMM”), a tblastn screen of the NCBI microbial database using the CydX protein sequence as the query and an expect value of 1000 (“tblastn”), or by manual curation (“Missed”). (B) Receiver operating characteristic (ROC) plot of a tblastn screen of the microbial database using the CydX protein sequence as the query with different E-value cutoffs. (C) Graph of the number of CydX homologues identified in a tblastn screen of the microbial database using the CydX protein sequence as the query with different expect values. All tblastn searches were conducted using the NCBI BLAST Microbial Genomes site [45].
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4325964&req=5

Fig2: Evaluating methods for accurately identifying CydX homologues in 1121 species of bacteria. (A) Venn diagram of the number of CydX homologues identified by an HMM-based method (“HMM”), a tblastn screen of the NCBI microbial database using the CydX protein sequence as the query and an expect value of 1000 (“tblastn”), or by manual curation (“Missed”). (B) Receiver operating characteristic (ROC) plot of a tblastn screen of the microbial database using the CydX protein sequence as the query with different E-value cutoffs. (C) Graph of the number of CydX homologues identified in a tblastn screen of the microbial database using the CydX protein sequence as the query with different expect values. All tblastn searches were conducted using the NCBI BLAST Microbial Genomes site [45].

Mentions: In order to investigate the extent of CydX conservation, complete genomes from 1095 taxa that span the major Eubacterial divisions were screened for potential homologues of the E. coli CydX protein. Two different bioinformatics techniques were used to identify homologues. The first technique was a series of searches for CydX homologues in each genome using the protein-nucleotide six-frame translation Basic Local Alignment Search Tool (tblastn) for microbial genomes, with the E. coli CydX protein sequence used as the query for each search. This method had the potential of identifying both annotated genes and unannotated ORFs that encode homologues. In order to maximize the probability of identifying divergent homologues, these tblastn searches were conducted using a very low stringency (E-value cutoff = 1000) with the low information filter turned off. On average, each search returned between 200–400 hits, with manual analysis of the tblastn results yielding between 1–10 likely candidates. For each candidate, the potential open reading frame was translated and screened for a significant Pfam hit for the ybgT_yccB small protein family. In a few cases, a potential homologue was identified in a search that did not give a significant Pfam hit, but showed substantial sequence similarity to CydX. In these instances, the distance of the ORF from the cydAB operon, the presence of an identifiable ribosome binding site, and the alignment of the small protein with the E. coli CydX sequence were used to determine if the ORF should be considered a homologue. In total, this method yielded 294 homologues (Figure 2A and Additional file 1).Figure 2


Conservation analysis of the CydX protein yields insights into small protein identification and evolution.

Allen RJ, Brenner EP, VanOrsdel CE, Hobson JJ, Hearn DJ, Hemm MR - BMC Genomics (2014)

Evaluating methods for accurately identifying CydX homologues in 1121 species of bacteria. (A) Venn diagram of the number of CydX homologues identified by an HMM-based method (“HMM”), a tblastn screen of the NCBI microbial database using the CydX protein sequence as the query and an expect value of 1000 (“tblastn”), or by manual curation (“Missed”). (B) Receiver operating characteristic (ROC) plot of a tblastn screen of the microbial database using the CydX protein sequence as the query with different E-value cutoffs. (C) Graph of the number of CydX homologues identified in a tblastn screen of the microbial database using the CydX protein sequence as the query with different expect values. All tblastn searches were conducted using the NCBI BLAST Microbial Genomes site [45].
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4325964&req=5

Fig2: Evaluating methods for accurately identifying CydX homologues in 1121 species of bacteria. (A) Venn diagram of the number of CydX homologues identified by an HMM-based method (“HMM”), a tblastn screen of the NCBI microbial database using the CydX protein sequence as the query and an expect value of 1000 (“tblastn”), or by manual curation (“Missed”). (B) Receiver operating characteristic (ROC) plot of a tblastn screen of the microbial database using the CydX protein sequence as the query with different E-value cutoffs. (C) Graph of the number of CydX homologues identified in a tblastn screen of the microbial database using the CydX protein sequence as the query with different expect values. All tblastn searches were conducted using the NCBI BLAST Microbial Genomes site [45].
Mentions: In order to investigate the extent of CydX conservation, complete genomes from 1095 taxa that span the major Eubacterial divisions were screened for potential homologues of the E. coli CydX protein. Two different bioinformatics techniques were used to identify homologues. The first technique was a series of searches for CydX homologues in each genome using the protein-nucleotide six-frame translation Basic Local Alignment Search Tool (tblastn) for microbial genomes, with the E. coli CydX protein sequence used as the query for each search. This method had the potential of identifying both annotated genes and unannotated ORFs that encode homologues. In order to maximize the probability of identifying divergent homologues, these tblastn searches were conducted using a very low stringency (E-value cutoff = 1000) with the low information filter turned off. On average, each search returned between 200–400 hits, with manual analysis of the tblastn results yielding between 1–10 likely candidates. For each candidate, the potential open reading frame was translated and screened for a significant Pfam hit for the ybgT_yccB small protein family. In a few cases, a potential homologue was identified in a search that did not give a significant Pfam hit, but showed substantial sequence similarity to CydX. In these instances, the distance of the ORF from the cydAB operon, the presence of an identifiable ribosome binding site, and the alignment of the small protein with the E. coli CydX sequence were used to determine if the ORF should be considered a homologue. In total, this method yielded 294 homologues (Figure 2A and Additional file 1).Figure 2

Bottom Line: Further investigation of cydAB operons identified two additional conserved hypothetical small proteins: CydY encoded in CydAQlong operons that lack cydX, and CydZ encoded in more than 150 CydAQshort operons.These results elucidate the prevalence of CydX throughout the Proteobacteria, provide insight into the selection pressure and sequence requirements for CydX function, and suggest a potential functional interaction between the small protein and the CydA Q-loop, an enigmatic domain of the cytochrome bd oxidase complex.Finally, these results identify other conserved small proteins encoded in cytochrome bd oxidase operons, suggesting that small protein subunits may be a more common component of these enzymes than previously thought.

View Article: PubMed Central - PubMed

Affiliation: Department of Biological Sciences, Towson University, Towson 21252MD, USA. mhemm@towson.edu.

ABSTRACT

Background: The reliable identification of proteins containing 50 or fewer amino acids is difficult due to the limited information content in short sequences. The 37 amino acid CydX protein in Escherichia coli is a member of the cytochrome bd oxidase complex, an enzyme found throughout Eubacteria. To investigate the extent of CydX conservation and prevalence and evaluate different methods of small protein homologue identification, we surveyed 1095 Eubacteria species for the presence of the small protein.

Results: Over 300 homologues were identified, including 80 unannotated genes. The ability of both closely-related and divergent homologues to complement the E. coli ΔcydX mutant supports our identification techniques, and suggests that CydX homologues retain similar function among divergent species. However, sequence analysis of these proteins shows a great degree of variability, with only a few highly-conserved residues. An analysis of the co-variation between CydX homologues and their corresponding cydA and cydB genes shows a close synteny of the small protein with the CydA long Q-loop. Phylogenetic analysis suggests that the cydABX operon has undergone horizontal gene transfer, although the cydX gene likely evolved in a progenitor of the Alpha, Beta, and Gammaproteobacteria. Further investigation of cydAB operons identified two additional conserved hypothetical small proteins: CydY encoded in CydAQlong operons that lack cydX, and CydZ encoded in more than 150 CydAQshort operons.

Conclusions: This study provides a systematic analysis of bioinformatics techniques required for the unique challenges present in small protein identification and phylogenetic analyses. These results elucidate the prevalence of CydX throughout the Proteobacteria, provide insight into the selection pressure and sequence requirements for CydX function, and suggest a potential functional interaction between the small protein and the CydA Q-loop, an enigmatic domain of the cytochrome bd oxidase complex. Finally, these results identify other conserved small proteins encoded in cytochrome bd oxidase operons, suggesting that small protein subunits may be a more common component of these enzymes than previously thought.

Show MeSH
Related in: MedlinePlus