Limits...
Systematic search for putative new domain families in Mycoplasma gallisepticum genome.

Reddy CC, Rani SS, Offmann B, Sowdhamini R - BMC Res Notes (2010)

Bottom Line: Difficulties in identifying the domain content of a given sequence arise when the query sequence has no homologues with experimentally determined structure and searching against sequence domain databases also results in insignificant matches.Identification of domains under low sequence identity conditions and lack of structural homologues acquire a crucial importance especially at the genomic scale.Further investigation of these predicted new domains may prove beneficial in improving the existing domain prediction algorithms.

View Article: PubMed Central - HTML - PubMed

Affiliation: Université de La Réunion, Equipe de Bioinformatique, Laboratoire de Biochimie et Génétique Moléculaire, 15 ave René Cassin, 97715 Saint Denis Messag Cedex 09, La Réunion, France. bernard.offmann@univ-reunion.fr.

ABSTRACT

Background: Protein domains are the fundamental units of protein structure, function and evolution. The delineation of different domains in proteins is important for classification, understanding of structure, function and evolution. The delineation of protein domains within a polypeptide chain, namely at the genome scale, can be achieved in several ways but may remain problematic in many instances. Difficulties in identifying the domain content of a given sequence arise when the query sequence has no homologues with experimentally determined structure and searching against sequence domain databases also results in insignificant matches. Identification of domains under low sequence identity conditions and lack of structural homologues acquire a crucial importance especially at the genomic scale.

Findings: We have developed a new method for the identification of domains in unassigned regions through indirect connections and scaled up its application to the analysis of 434 unassigned regions in 726 protein sequences of Mycoplasma gallisepticum genome. We could establish 71 new domain relationships and probable 63 putative new domain families through intermediate sequences in the unassigned regions, which importantly represent an overall 10% increase in PfamA domain annotation over the direct assignment in this genome.

Conclusions: The systematic analysis of the unassigned regions in the Mycoplasma gallisepticum genome has provided some insight into the possible new domain relationships and putative new domain families. Further investigation of these predicted new domains may prove beneficial in improving the existing domain prediction algorithms.

No MeSH data available.


Related in: MedlinePlus

Multiple sequence alignment of unassigned region NP_852844.1.39-109, (indicated by '*' in the alignment) and its homologues obtained in the PSI-BLAST search. Consensus sequence is shown on the top of the alignment.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2865477&req=5

Figure 5: Multiple sequence alignment of unassigned region NP_852844.1.39-109, (indicated by '*' in the alignment) and its homologues obtained in the PSI-BLAST search. Consensus sequence is shown on the top of the alignment.

Mentions: In another protein NP_852844.1, we had analysed an unassigned region from residues 38 to 109 and the gene product already was associated with KOW domain at the N-terminus (from 5 to 37 residues). 109 homologues could be identified by PSI-BLAST and all homologues belong to prokaryotic organisms (Figure 5). All the homologous sequences have similar predicted secondary structure content. Most of the homologues also have similar domain architecture with N-terminal KOW motif and C- terminus as an unassigned region. KOW motif is only about 35 residues long and links a bacterial transcription factor with ribosomal proteins[17]. The presence of conserved residues, with twice the size of KOW motif at the C-terminal region, suggests the functional role of C-domain in additionally stabilizing the oligomeric assemblies and thereby perhaps contributing to improved efficiency of protein expression.


Systematic search for putative new domain families in Mycoplasma gallisepticum genome.

Reddy CC, Rani SS, Offmann B, Sowdhamini R - BMC Res Notes (2010)

Multiple sequence alignment of unassigned region NP_852844.1.39-109, (indicated by '*' in the alignment) and its homologues obtained in the PSI-BLAST search. Consensus sequence is shown on the top of the alignment.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2865477&req=5

Figure 5: Multiple sequence alignment of unassigned region NP_852844.1.39-109, (indicated by '*' in the alignment) and its homologues obtained in the PSI-BLAST search. Consensus sequence is shown on the top of the alignment.
Mentions: In another protein NP_852844.1, we had analysed an unassigned region from residues 38 to 109 and the gene product already was associated with KOW domain at the N-terminus (from 5 to 37 residues). 109 homologues could be identified by PSI-BLAST and all homologues belong to prokaryotic organisms (Figure 5). All the homologous sequences have similar predicted secondary structure content. Most of the homologues also have similar domain architecture with N-terminal KOW motif and C- terminus as an unassigned region. KOW motif is only about 35 residues long and links a bacterial transcription factor with ribosomal proteins[17]. The presence of conserved residues, with twice the size of KOW motif at the C-terminal region, suggests the functional role of C-domain in additionally stabilizing the oligomeric assemblies and thereby perhaps contributing to improved efficiency of protein expression.

Bottom Line: Difficulties in identifying the domain content of a given sequence arise when the query sequence has no homologues with experimentally determined structure and searching against sequence domain databases also results in insignificant matches.Identification of domains under low sequence identity conditions and lack of structural homologues acquire a crucial importance especially at the genomic scale.Further investigation of these predicted new domains may prove beneficial in improving the existing domain prediction algorithms.

View Article: PubMed Central - HTML - PubMed

Affiliation: Université de La Réunion, Equipe de Bioinformatique, Laboratoire de Biochimie et Génétique Moléculaire, 15 ave René Cassin, 97715 Saint Denis Messag Cedex 09, La Réunion, France. bernard.offmann@univ-reunion.fr.

ABSTRACT

Background: Protein domains are the fundamental units of protein structure, function and evolution. The delineation of different domains in proteins is important for classification, understanding of structure, function and evolution. The delineation of protein domains within a polypeptide chain, namely at the genome scale, can be achieved in several ways but may remain problematic in many instances. Difficulties in identifying the domain content of a given sequence arise when the query sequence has no homologues with experimentally determined structure and searching against sequence domain databases also results in insignificant matches. Identification of domains under low sequence identity conditions and lack of structural homologues acquire a crucial importance especially at the genomic scale.

Findings: We have developed a new method for the identification of domains in unassigned regions through indirect connections and scaled up its application to the analysis of 434 unassigned regions in 726 protein sequences of Mycoplasma gallisepticum genome. We could establish 71 new domain relationships and probable 63 putative new domain families through intermediate sequences in the unassigned regions, which importantly represent an overall 10% increase in PfamA domain annotation over the direct assignment in this genome.

Conclusions: The systematic analysis of the unassigned regions in the Mycoplasma gallisepticum genome has provided some insight into the possible new domain relationships and putative new domain families. Further investigation of these predicted new domains may prove beneficial in improving the existing domain prediction algorithms.

No MeSH data available.


Related in: MedlinePlus