Limits...
Systematic search for putative new domain families in Mycoplasma gallisepticum genome.

Reddy CC, Rani SS, Offmann B, Sowdhamini R - BMC Res Notes (2010)

Bottom Line: Difficulties in identifying the domain content of a given sequence arise when the query sequence has no homologues with experimentally determined structure and searching against sequence domain databases also results in insignificant matches.Identification of domains under low sequence identity conditions and lack of structural homologues acquire a crucial importance especially at the genomic scale.Further investigation of these predicted new domains may prove beneficial in improving the existing domain prediction algorithms.

View Article: PubMed Central - HTML - PubMed

Affiliation: Université de La Réunion, Equipe de Bioinformatique, Laboratoire de Biochimie et Génétique Moléculaire, 15 ave René Cassin, 97715 Saint Denis Messag Cedex 09, La Réunion, France. bernard.offmann@univ-reunion.fr.

ABSTRACT

Background: Protein domains are the fundamental units of protein structure, function and evolution. The delineation of different domains in proteins is important for classification, understanding of structure, function and evolution. The delineation of protein domains within a polypeptide chain, namely at the genome scale, can be achieved in several ways but may remain problematic in many instances. Difficulties in identifying the domain content of a given sequence arise when the query sequence has no homologues with experimentally determined structure and searching against sequence domain databases also results in insignificant matches. Identification of domains under low sequence identity conditions and lack of structural homologues acquire a crucial importance especially at the genomic scale.

Findings: We have developed a new method for the identification of domains in unassigned regions through indirect connections and scaled up its application to the analysis of 434 unassigned regions in 726 protein sequences of Mycoplasma gallisepticum genome. We could establish 71 new domain relationships and probable 63 putative new domain families through intermediate sequences in the unassigned regions, which importantly represent an overall 10% increase in PfamA domain annotation over the direct assignment in this genome.

Conclusions: The systematic analysis of the unassigned regions in the Mycoplasma gallisepticum genome has provided some insight into the possible new domain relationships and putative new domain families. Further investigation of these predicted new domains may prove beneficial in improving the existing domain prediction algorithms.

No MeSH data available.


Related in: MedlinePlus

Phylogenetic tree of homologues obtained in the PSI-BLAST search. Domain architecture is shown on the top-right. All the homologues have identical domain architecture with amino-terminal AsnA domain. The mode of deriving phylogenetic trees is as described in Methods.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2865477&req=5

Figure 3: Phylogenetic tree of homologues obtained in the PSI-BLAST search. Domain architecture is shown on the top-right. All the homologues have identical domain architecture with amino-terminal AsnA domain. The mode of deriving phylogenetic trees is as described in Methods.

Mentions: The NP_853398.1 protein (Figure 2, Figure 3) sequence, which is 329 residues long, has a single asparagine synthetase (AsnA) domain from 5 to 241 residues and one C-terminus unassigned region from 242 to 329. When the unassigned region of this protein (242-329) was analyzed, based on the intermediate sequences using the methodology described above, 106 similar sequences were identified in the PSI-BLAST search and these hit sequences were from both prokaryotes and eukaryotes. In the HMMpfam search, however, it was not associated with any PfamA domain, rather it was associated with PfamB_3316 domain. This unassigned region has about 62% predicted secondary structural content with 5 helices and 3 β-strands. More interestingly, the predicted β-3α-β-α-β-α secondary structure pattern is conserved in all the homologous sequences. All the homologous sequences have similar domain architecture. The crystal structure of E.coli asparagine synthetase also showed the presence of this small subdomain[16]. Aspartate--ammonia ligase (asparagine synthetase) catalyses the conversion of L-aspartate to L-asparagine in the presence of ATP and ammonia. AsnA structure revealed that AsnA structure is similar to that of the catalytic domain of yeast aspartyl-tRNA synthetase despite low sequence similarity. These enzymes have a common reaction mechanism that implies the formation of an aminoacyl-adenylate intermediate. The cluster of highly conserved residues (GGGIG) motif plays an important role in the formation of a cavity which can accommodate bound ATP in aspartyl-tRNA synthetase[16]. Since this motif is conserved in the newly predicted putative domain, it may play an important role in the ligand binding.


Systematic search for putative new domain families in Mycoplasma gallisepticum genome.

Reddy CC, Rani SS, Offmann B, Sowdhamini R - BMC Res Notes (2010)

Phylogenetic tree of homologues obtained in the PSI-BLAST search. Domain architecture is shown on the top-right. All the homologues have identical domain architecture with amino-terminal AsnA domain. The mode of deriving phylogenetic trees is as described in Methods.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2865477&req=5

Figure 3: Phylogenetic tree of homologues obtained in the PSI-BLAST search. Domain architecture is shown on the top-right. All the homologues have identical domain architecture with amino-terminal AsnA domain. The mode of deriving phylogenetic trees is as described in Methods.
Mentions: The NP_853398.1 protein (Figure 2, Figure 3) sequence, which is 329 residues long, has a single asparagine synthetase (AsnA) domain from 5 to 241 residues and one C-terminus unassigned region from 242 to 329. When the unassigned region of this protein (242-329) was analyzed, based on the intermediate sequences using the methodology described above, 106 similar sequences were identified in the PSI-BLAST search and these hit sequences were from both prokaryotes and eukaryotes. In the HMMpfam search, however, it was not associated with any PfamA domain, rather it was associated with PfamB_3316 domain. This unassigned region has about 62% predicted secondary structural content with 5 helices and 3 β-strands. More interestingly, the predicted β-3α-β-α-β-α secondary structure pattern is conserved in all the homologous sequences. All the homologous sequences have similar domain architecture. The crystal structure of E.coli asparagine synthetase also showed the presence of this small subdomain[16]. Aspartate--ammonia ligase (asparagine synthetase) catalyses the conversion of L-aspartate to L-asparagine in the presence of ATP and ammonia. AsnA structure revealed that AsnA structure is similar to that of the catalytic domain of yeast aspartyl-tRNA synthetase despite low sequence similarity. These enzymes have a common reaction mechanism that implies the formation of an aminoacyl-adenylate intermediate. The cluster of highly conserved residues (GGGIG) motif plays an important role in the formation of a cavity which can accommodate bound ATP in aspartyl-tRNA synthetase[16]. Since this motif is conserved in the newly predicted putative domain, it may play an important role in the ligand binding.

Bottom Line: Difficulties in identifying the domain content of a given sequence arise when the query sequence has no homologues with experimentally determined structure and searching against sequence domain databases also results in insignificant matches.Identification of domains under low sequence identity conditions and lack of structural homologues acquire a crucial importance especially at the genomic scale.Further investigation of these predicted new domains may prove beneficial in improving the existing domain prediction algorithms.

View Article: PubMed Central - HTML - PubMed

Affiliation: Université de La Réunion, Equipe de Bioinformatique, Laboratoire de Biochimie et Génétique Moléculaire, 15 ave René Cassin, 97715 Saint Denis Messag Cedex 09, La Réunion, France. bernard.offmann@univ-reunion.fr.

ABSTRACT

Background: Protein domains are the fundamental units of protein structure, function and evolution. The delineation of different domains in proteins is important for classification, understanding of structure, function and evolution. The delineation of protein domains within a polypeptide chain, namely at the genome scale, can be achieved in several ways but may remain problematic in many instances. Difficulties in identifying the domain content of a given sequence arise when the query sequence has no homologues with experimentally determined structure and searching against sequence domain databases also results in insignificant matches. Identification of domains under low sequence identity conditions and lack of structural homologues acquire a crucial importance especially at the genomic scale.

Findings: We have developed a new method for the identification of domains in unassigned regions through indirect connections and scaled up its application to the analysis of 434 unassigned regions in 726 protein sequences of Mycoplasma gallisepticum genome. We could establish 71 new domain relationships and probable 63 putative new domain families through intermediate sequences in the unassigned regions, which importantly represent an overall 10% increase in PfamA domain annotation over the direct assignment in this genome.

Conclusions: The systematic analysis of the unassigned regions in the Mycoplasma gallisepticum genome has provided some insight into the possible new domain relationships and putative new domain families. Further investigation of these predicted new domains may prove beneficial in improving the existing domain prediction algorithms.

No MeSH data available.


Related in: MedlinePlus