Limits...
The DNA-repair protein AlkB, EGL-9, and leprecan define new families of 2-oxoglutarate- and iron-dependent dioxygenases.

Aravind L, Koonin EV - Genome Biol. (2001)

Bottom Line: Here we describe such predictions resulting from an analysis of the 2-oxoglutarate (2OG) and Fe(II)-dependent oxygenases, a class of enzymes that are widespread in eukaryotes and bacteria and catalyze a variety of reactions typically involving the oxidation of an organic substrate using a dioxygen molecule.The EGL-9 protein from Caenorhabditis elegans is necessary for normal muscle function and its inactivation results in resistance against paralysis induced by the Pseudomonas aeruginosa toxin.This allows us to predict the catalytic activity for a wide range of biologically important, but biochemically uncharacterized proteins from eukaryotes and bacteria.

View Article: PubMed Central - HTML - PubMed

Affiliation: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA. aravind@ncbi.nlm.nih.gov

ABSTRACT

Background: Protein fold recognition using sequence profile searches frequently allows prediction of the structure and biochemical mechanisms of proteins with an important biological function but unknown biochemical activity. Here we describe such predictions resulting from an analysis of the 2-oxoglutarate (2OG) and Fe(II)-dependent oxygenases, a class of enzymes that are widespread in eukaryotes and bacteria and catalyze a variety of reactions typically involving the oxidation of an organic substrate using a dioxygen molecule.

Results: We employ sequence profile analysis to show that the DNA repair protein AlkB, the extracellular matrix protein leprecan, the disease-resistance-related protein EGL-9 and several uncharacterized proteins define novel families of enzymes of the 2OG-Fe(II) oxygenase superfamily. The identification of AlkB as a member of the 2OG-Fe(II) oxygenase superfamily suggests that this protein catalyzes oxidative detoxification of alkylated bases. More distant homologs of AlkB were detected in eukaryotes and in plant RNA viruses, leading to the hypothesis that these proteins might be involved in RNA demethylation. The EGL-9 protein from Caenorhabditis elegans is necessary for normal muscle function and its inactivation results in resistance against paralysis induced by the Pseudomonas aeruginosa toxin. EGL-9 and leprecan are predicted to be novel protein hydroxylases that might be involved in the generation of substrates for protein glycosylation.

Conclusions: Here, using sequence profile searches, we show that several previously undetected protein families contain 2OG-Fe(II) oxygenase fold. This allows us to predict the catalytic activity for a wide range of biologically important, but biochemically uncharacterized proteins from eukaryotes and bacteria.

Show MeSH

Related in: MedlinePlus

Multiple sequence alignment of the 2OG-Fe(II) dioxygenase superfamily. Individual protein families are separated by blank lines and a brief description of each family is given to the right of the alignment. The numbers at the ends of the alignment indicate the position of the first and last of the aligned residues in the respective protein sequences. The consensus secondary structure is shown above the alignment in uppercase letters. It was derived by taking those elements that are shared by the predicted structures of individual families and the experimentally determined structures; H indicates α helix and E indicates extended conformation (β strand). The lowercase letters represent extensions of the secondary structure elements that are seen in some, but not all, members of the superfamily. The conserved amino-terminal extensions that are specific only to a given family are separated from the rest of the alignment by vertical lines. The coloring of the alignment columns is according to the 85% consensus that is shown underneath the alignment and includes the following categories of amino acid residues: h,hydrophobic; l, aliphatic; a, aromatic (Y, F, W, H, L, I, V, M, A, all shaded yellow); s, small (S, A, G, T, V, P, N, H, D, shaded blue); b, big (K, R, E, Q, W, F, Y, L, M, I, shaded gray); +, positively charged (K, R, H; colored magenta). The (predicted) catalytic residues are indicated by asterisks and with reverse red shading. The proteins are designated by the protein/gene name, the species abbreviation and the gene identification (GI) number. Protein abbreviations are: CAS, clavaminic acid synthase; DAOCS, deacetoxycephalosporin C synthetase; EFE, ethylene-forming enzyme; FLAS, flavonol synthase; Ga20Ox, giberellin 20-oxidase; IPNS, isopenicillin N synthase; LDOX, leucoanthocyanidin hydroxylase; Lep, leprecan; P4HA, prolyl-4-hydroxylase; PLO, lysyl hydroxylase; SanF and SanC, enzymes involved in nikkomycin biosynthesis. The remaining names are the standard names of the genes that encode the respective proteins. Species abbreviations: At, Arabidopsis thaliana; Bb, Borrelia burgdorferi; Cc, Caulobacter crescentus; Ce, Caenorhabditis elegans; Ci, Ciona intestinalis; Dm, Drosophila melanogaster; Ec, Escherichia coli; Em, Emericella nidulans; Hs, Homo sapiens; Lc, Lysobacter lactamgenus; Le, Lycopersicon esculentum; Mtu, Mycobacterium tuberculosis; Nc, Neurospora crassa; Pa, Pseudomonas aeruginosa; Pet, Petunia hybrida; Rr, Rattus rattus; Sc, Saccharomyces cerevisiae; Sp, Schizosaccharomyces pombe; Sot, Solanum tuberosum; Scoe, Streptomyces coelicolor; Scan, Streptomyces ansochromogenes; Scla, Streptomyces clavuligerus; Ssp, Synechocystis; Vc, Vibrio cholerae; ASPV, apple stem pitting virus; ACLSV, apple chlorotic leaf spot virus; BSV, blueberry scorch virus; GLV, garlic latent virus; GVA, grapevine virus A; PBCV, Parameciumbursaria chlorella virus; PMV, papaya mosaic virus; SHVX, shallot virus X.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC30706&req=5

Figure 1: Multiple sequence alignment of the 2OG-Fe(II) dioxygenase superfamily. Individual protein families are separated by blank lines and a brief description of each family is given to the right of the alignment. The numbers at the ends of the alignment indicate the position of the first and last of the aligned residues in the respective protein sequences. The consensus secondary structure is shown above the alignment in uppercase letters. It was derived by taking those elements that are shared by the predicted structures of individual families and the experimentally determined structures; H indicates α helix and E indicates extended conformation (β strand). The lowercase letters represent extensions of the secondary structure elements that are seen in some, but not all, members of the superfamily. The conserved amino-terminal extensions that are specific only to a given family are separated from the rest of the alignment by vertical lines. The coloring of the alignment columns is according to the 85% consensus that is shown underneath the alignment and includes the following categories of amino acid residues: h,hydrophobic; l, aliphatic; a, aromatic (Y, F, W, H, L, I, V, M, A, all shaded yellow); s, small (S, A, G, T, V, P, N, H, D, shaded blue); b, big (K, R, E, Q, W, F, Y, L, M, I, shaded gray); +, positively charged (K, R, H; colored magenta). The (predicted) catalytic residues are indicated by asterisks and with reverse red shading. The proteins are designated by the protein/gene name, the species abbreviation and the gene identification (GI) number. Protein abbreviations are: CAS, clavaminic acid synthase; DAOCS, deacetoxycephalosporin C synthetase; EFE, ethylene-forming enzyme; FLAS, flavonol synthase; Ga20Ox, giberellin 20-oxidase; IPNS, isopenicillin N synthase; LDOX, leucoanthocyanidin hydroxylase; Lep, leprecan; P4HA, prolyl-4-hydroxylase; PLO, lysyl hydroxylase; SanF and SanC, enzymes involved in nikkomycin biosynthesis. The remaining names are the standard names of the genes that encode the respective proteins. Species abbreviations: At, Arabidopsis thaliana; Bb, Borrelia burgdorferi; Cc, Caulobacter crescentus; Ce, Caenorhabditis elegans; Ci, Ciona intestinalis; Dm, Drosophila melanogaster; Ec, Escherichia coli; Em, Emericella nidulans; Hs, Homo sapiens; Lc, Lysobacter lactamgenus; Le, Lycopersicon esculentum; Mtu, Mycobacterium tuberculosis; Nc, Neurospora crassa; Pa, Pseudomonas aeruginosa; Pet, Petunia hybrida; Rr, Rattus rattus; Sc, Saccharomyces cerevisiae; Sp, Schizosaccharomyces pombe; Sot, Solanum tuberosum; Scoe, Streptomyces coelicolor; Scan, Streptomyces ansochromogenes; Scla, Streptomyces clavuligerus; Ssp, Synechocystis; Vc, Vibrio cholerae; ASPV, apple stem pitting virus; ACLSV, apple chlorotic leaf spot virus; BSV, blueberry scorch virus; GLV, garlic latent virus; GVA, grapevine virus A; PBCV, Parameciumbursaria chlorella virus; PMV, papaya mosaic virus; SHVX, shallot virus X.

Mentions: The Non-redundant Protein Sequence Database (NCBI) [21] was searched using the PSI-BLAST program [22] run to convergence, with a profile-inclusion threshold of 0.01 and AlkB protein sequences from various organisms as queries. In addition to the AlkB orthologs, these searches retrieved from the database, with statistically significant expectation (e) values, several other more distant homologs of AlkB, including uncharacterized eukaryotic proteins and fragments of the polyproteins of plant RNA viruses from the caria-, tricho- and potexvirus families. Examples of homologs found include: Leishmania L3377.4, iteration 5, e-value = 8 × 10-7; Drosophila CG17807, iteration 3, e-value = 4 × 10-6; papaya mosaic virus, iteration 3, e-value = 2 × 10-4. Further iterations of the search using each of the detected proteins as a new query resulted in the detection of several more eukaryotic proteins, including EGL-9 and leprecan, several uncharacterized bacterial proteins and prolyl and lysyl hydroxylases. Finally, another iteration of database searches initiated with the sequences of bacterial proteins, typified by E. coli YbiX, resulted in the unification of these proteins with plant dioxygenases such as leucoanthocyanidin oxidase and gibberellin-20 oxidase. In this context, it should be noted that the DNA-repair proteins typified by E. coli AlkB are unrelated to the alkane omega-hydroxylase typified by the Ps. oleovorans protein also named AlkB. Fortuitously, these latter alkane hydroxylases are also oxygenases; however, they are not 2OG-Fe(II) dioxygenases but a distinct class of di-iron enzymes [23]. On the basis of the results of database searches with representative sequences, we delineated several distinct families within the 2OG-Fe(II) dioxygenase fold and constructed individual alignments for each using the ClustalW program [24]. Secondary structure was predicted for each family using the PHD [25] and PSI-PRED [26] programs. Using the secondary structure elements from the experimentally determined structures of IPNS (PDB:1ips), DAOCS (PDB:1dcs), CAS (PDB:1drt) and the predicted secondary structures for individual families, the conserved core of these elements was delineated (Figure 1). A multiple alignment for the entire superfamily was constructed by aligning the conserved sequence features shared by all the individual families. The boundaries of the secondary structure elements in the alignment were adjusted using the secondary structure conservation as a guide.


The DNA-repair protein AlkB, EGL-9, and leprecan define new families of 2-oxoglutarate- and iron-dependent dioxygenases.

Aravind L, Koonin EV - Genome Biol. (2001)

Multiple sequence alignment of the 2OG-Fe(II) dioxygenase superfamily. Individual protein families are separated by blank lines and a brief description of each family is given to the right of the alignment. The numbers at the ends of the alignment indicate the position of the first and last of the aligned residues in the respective protein sequences. The consensus secondary structure is shown above the alignment in uppercase letters. It was derived by taking those elements that are shared by the predicted structures of individual families and the experimentally determined structures; H indicates α helix and E indicates extended conformation (β strand). The lowercase letters represent extensions of the secondary structure elements that are seen in some, but not all, members of the superfamily. The conserved amino-terminal extensions that are specific only to a given family are separated from the rest of the alignment by vertical lines. The coloring of the alignment columns is according to the 85% consensus that is shown underneath the alignment and includes the following categories of amino acid residues: h,hydrophobic; l, aliphatic; a, aromatic (Y, F, W, H, L, I, V, M, A, all shaded yellow); s, small (S, A, G, T, V, P, N, H, D, shaded blue); b, big (K, R, E, Q, W, F, Y, L, M, I, shaded gray); +, positively charged (K, R, H; colored magenta). The (predicted) catalytic residues are indicated by asterisks and with reverse red shading. The proteins are designated by the protein/gene name, the species abbreviation and the gene identification (GI) number. Protein abbreviations are: CAS, clavaminic acid synthase; DAOCS, deacetoxycephalosporin C synthetase; EFE, ethylene-forming enzyme; FLAS, flavonol synthase; Ga20Ox, giberellin 20-oxidase; IPNS, isopenicillin N synthase; LDOX, leucoanthocyanidin hydroxylase; Lep, leprecan; P4HA, prolyl-4-hydroxylase; PLO, lysyl hydroxylase; SanF and SanC, enzymes involved in nikkomycin biosynthesis. The remaining names are the standard names of the genes that encode the respective proteins. Species abbreviations: At, Arabidopsis thaliana; Bb, Borrelia burgdorferi; Cc, Caulobacter crescentus; Ce, Caenorhabditis elegans; Ci, Ciona intestinalis; Dm, Drosophila melanogaster; Ec, Escherichia coli; Em, Emericella nidulans; Hs, Homo sapiens; Lc, Lysobacter lactamgenus; Le, Lycopersicon esculentum; Mtu, Mycobacterium tuberculosis; Nc, Neurospora crassa; Pa, Pseudomonas aeruginosa; Pet, Petunia hybrida; Rr, Rattus rattus; Sc, Saccharomyces cerevisiae; Sp, Schizosaccharomyces pombe; Sot, Solanum tuberosum; Scoe, Streptomyces coelicolor; Scan, Streptomyces ansochromogenes; Scla, Streptomyces clavuligerus; Ssp, Synechocystis; Vc, Vibrio cholerae; ASPV, apple stem pitting virus; ACLSV, apple chlorotic leaf spot virus; BSV, blueberry scorch virus; GLV, garlic latent virus; GVA, grapevine virus A; PBCV, Parameciumbursaria chlorella virus; PMV, papaya mosaic virus; SHVX, shallot virus X.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC30706&req=5

Figure 1: Multiple sequence alignment of the 2OG-Fe(II) dioxygenase superfamily. Individual protein families are separated by blank lines and a brief description of each family is given to the right of the alignment. The numbers at the ends of the alignment indicate the position of the first and last of the aligned residues in the respective protein sequences. The consensus secondary structure is shown above the alignment in uppercase letters. It was derived by taking those elements that are shared by the predicted structures of individual families and the experimentally determined structures; H indicates α helix and E indicates extended conformation (β strand). The lowercase letters represent extensions of the secondary structure elements that are seen in some, but not all, members of the superfamily. The conserved amino-terminal extensions that are specific only to a given family are separated from the rest of the alignment by vertical lines. The coloring of the alignment columns is according to the 85% consensus that is shown underneath the alignment and includes the following categories of amino acid residues: h,hydrophobic; l, aliphatic; a, aromatic (Y, F, W, H, L, I, V, M, A, all shaded yellow); s, small (S, A, G, T, V, P, N, H, D, shaded blue); b, big (K, R, E, Q, W, F, Y, L, M, I, shaded gray); +, positively charged (K, R, H; colored magenta). The (predicted) catalytic residues are indicated by asterisks and with reverse red shading. The proteins are designated by the protein/gene name, the species abbreviation and the gene identification (GI) number. Protein abbreviations are: CAS, clavaminic acid synthase; DAOCS, deacetoxycephalosporin C synthetase; EFE, ethylene-forming enzyme; FLAS, flavonol synthase; Ga20Ox, giberellin 20-oxidase; IPNS, isopenicillin N synthase; LDOX, leucoanthocyanidin hydroxylase; Lep, leprecan; P4HA, prolyl-4-hydroxylase; PLO, lysyl hydroxylase; SanF and SanC, enzymes involved in nikkomycin biosynthesis. The remaining names are the standard names of the genes that encode the respective proteins. Species abbreviations: At, Arabidopsis thaliana; Bb, Borrelia burgdorferi; Cc, Caulobacter crescentus; Ce, Caenorhabditis elegans; Ci, Ciona intestinalis; Dm, Drosophila melanogaster; Ec, Escherichia coli; Em, Emericella nidulans; Hs, Homo sapiens; Lc, Lysobacter lactamgenus; Le, Lycopersicon esculentum; Mtu, Mycobacterium tuberculosis; Nc, Neurospora crassa; Pa, Pseudomonas aeruginosa; Pet, Petunia hybrida; Rr, Rattus rattus; Sc, Saccharomyces cerevisiae; Sp, Schizosaccharomyces pombe; Sot, Solanum tuberosum; Scoe, Streptomyces coelicolor; Scan, Streptomyces ansochromogenes; Scla, Streptomyces clavuligerus; Ssp, Synechocystis; Vc, Vibrio cholerae; ASPV, apple stem pitting virus; ACLSV, apple chlorotic leaf spot virus; BSV, blueberry scorch virus; GLV, garlic latent virus; GVA, grapevine virus A; PBCV, Parameciumbursaria chlorella virus; PMV, papaya mosaic virus; SHVX, shallot virus X.
Mentions: The Non-redundant Protein Sequence Database (NCBI) [21] was searched using the PSI-BLAST program [22] run to convergence, with a profile-inclusion threshold of 0.01 and AlkB protein sequences from various organisms as queries. In addition to the AlkB orthologs, these searches retrieved from the database, with statistically significant expectation (e) values, several other more distant homologs of AlkB, including uncharacterized eukaryotic proteins and fragments of the polyproteins of plant RNA viruses from the caria-, tricho- and potexvirus families. Examples of homologs found include: Leishmania L3377.4, iteration 5, e-value = 8 × 10-7; Drosophila CG17807, iteration 3, e-value = 4 × 10-6; papaya mosaic virus, iteration 3, e-value = 2 × 10-4. Further iterations of the search using each of the detected proteins as a new query resulted in the detection of several more eukaryotic proteins, including EGL-9 and leprecan, several uncharacterized bacterial proteins and prolyl and lysyl hydroxylases. Finally, another iteration of database searches initiated with the sequences of bacterial proteins, typified by E. coli YbiX, resulted in the unification of these proteins with plant dioxygenases such as leucoanthocyanidin oxidase and gibberellin-20 oxidase. In this context, it should be noted that the DNA-repair proteins typified by E. coli AlkB are unrelated to the alkane omega-hydroxylase typified by the Ps. oleovorans protein also named AlkB. Fortuitously, these latter alkane hydroxylases are also oxygenases; however, they are not 2OG-Fe(II) dioxygenases but a distinct class of di-iron enzymes [23]. On the basis of the results of database searches with representative sequences, we delineated several distinct families within the 2OG-Fe(II) dioxygenase fold and constructed individual alignments for each using the ClustalW program [24]. Secondary structure was predicted for each family using the PHD [25] and PSI-PRED [26] programs. Using the secondary structure elements from the experimentally determined structures of IPNS (PDB:1ips), DAOCS (PDB:1dcs), CAS (PDB:1drt) and the predicted secondary structures for individual families, the conserved core of these elements was delineated (Figure 1). A multiple alignment for the entire superfamily was constructed by aligning the conserved sequence features shared by all the individual families. The boundaries of the secondary structure elements in the alignment were adjusted using the secondary structure conservation as a guide.

Bottom Line: Here we describe such predictions resulting from an analysis of the 2-oxoglutarate (2OG) and Fe(II)-dependent oxygenases, a class of enzymes that are widespread in eukaryotes and bacteria and catalyze a variety of reactions typically involving the oxidation of an organic substrate using a dioxygen molecule.The EGL-9 protein from Caenorhabditis elegans is necessary for normal muscle function and its inactivation results in resistance against paralysis induced by the Pseudomonas aeruginosa toxin.This allows us to predict the catalytic activity for a wide range of biologically important, but biochemically uncharacterized proteins from eukaryotes and bacteria.

View Article: PubMed Central - HTML - PubMed

Affiliation: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA. aravind@ncbi.nlm.nih.gov

ABSTRACT

Background: Protein fold recognition using sequence profile searches frequently allows prediction of the structure and biochemical mechanisms of proteins with an important biological function but unknown biochemical activity. Here we describe such predictions resulting from an analysis of the 2-oxoglutarate (2OG) and Fe(II)-dependent oxygenases, a class of enzymes that are widespread in eukaryotes and bacteria and catalyze a variety of reactions typically involving the oxidation of an organic substrate using a dioxygen molecule.

Results: We employ sequence profile analysis to show that the DNA repair protein AlkB, the extracellular matrix protein leprecan, the disease-resistance-related protein EGL-9 and several uncharacterized proteins define novel families of enzymes of the 2OG-Fe(II) oxygenase superfamily. The identification of AlkB as a member of the 2OG-Fe(II) oxygenase superfamily suggests that this protein catalyzes oxidative detoxification of alkylated bases. More distant homologs of AlkB were detected in eukaryotes and in plant RNA viruses, leading to the hypothesis that these proteins might be involved in RNA demethylation. The EGL-9 protein from Caenorhabditis elegans is necessary for normal muscle function and its inactivation results in resistance against paralysis induced by the Pseudomonas aeruginosa toxin. EGL-9 and leprecan are predicted to be novel protein hydroxylases that might be involved in the generation of substrates for protein glycosylation.

Conclusions: Here, using sequence profile searches, we show that several previously undetected protein families contain 2OG-Fe(II) oxygenase fold. This allows us to predict the catalytic activity for a wide range of biologically important, but biochemically uncharacterized proteins from eukaryotes and bacteria.

Show MeSH
Related in: MedlinePlus