Limits...
Insights into Protein Sequence and Structure-Derived Features Mediating 3D Domain Swapping Mechanism using Support Vector Machine Based Approach.

Shameer K, Pugalenthi G, Kandaswamy KK, Suganthan PN, Archunan G, Sowdhamini R - Bioinform Biol Insights (2010)

Bottom Line: We obtained 76.33% accuracy from training and 73.81% accuracy from testing.Due to high diversity in the sequence, structure and functions of proteins involved in domain swapping, availability of such an algorithm to predict swapping events from sequence and structure-derived features will be an initial step towards identification of more putative proteins that may be involved in swapping or proteins involved in deposition disease.Further, the top features emerging in our feature selection method may be analysed further to understand their roles in the mechanism of domain swapping.

View Article: PubMed Central - PubMed

Affiliation: National Centre for Biological Sciences (TIFR), GKVK Campus, Bellary Road, Bangalore, 560065, India.

ABSTRACT
3-dimensional domain swapping is a mechanism where two or more protein molecules form higher order oligomers by exchanging identical or similar subunits. Recently, this phenomenon has received much attention in the context of prions and neurodegenerative diseases, due to its role in the functional regulation, formation of higher oligomers, protein misfolding, aggregation etc. While 3-dimensional domain swap mechanism can be detected from three-dimensional structures, it remains a formidable challenge to derive common sequence or structural patterns from proteins involved in swapping. We have developed a SVM-based classifier to predict domain swapping events using a set of features derived from sequence and structural data. The SVM classifier was trained on features derived from 150 proteins reported to be involved in 3D domain swapping and 150 proteins not known to be involved in swapped conformation or related to proteins involved in swapping phenomenon. The testing was performed using 63 proteins from the positive dataset and 63 proteins from the negative dataset. We obtained 76.33% accuracy from training and 73.81% accuracy from testing. Due to high diversity in the sequence, structure and functions of proteins involved in domain swapping, availability of such an algorithm to predict swapping events from sequence and structure-derived features will be an initial step towards identification of more putative proteins that may be involved in swapping or proteins involved in deposition disease. Further, the top features emerging in our feature selection method may be analysed further to understand their roles in the mechanism of domain swapping.

No MeSH data available.


Related in: MedlinePlus

Schematic representation of data curation steps.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2901629&req=5

f2-bbi-2010-033: Schematic representation of data curation steps.

Mentions: We have performed extensive database and literature curation to collect sequence and structural data for proteins with the structural features of domain swapping. We have collected a set of PDB structures from Protein Data Bank (PDB)20 using a combination of integrative database searches and extensive literature curation of the existence and extent of 3D domain swapping. These entries were further manually analyzed using combination of macromolecular visualization tools PyMol,21 Rasmol22 and literature reports. The structural entries were further processed using Domain Identification ALgorithm (DIAL) server23 to identify probable swapped segments from the structural data. PDB ID to PubMed ID mapping and PDBSum database24 were used to obtain primary literature reports. Since many structures are not available in quaternary state from the PDB, Protein Quaternary Structure server (PQS)25 was consulted to obtain the quaternary assembly of the structures. From the extensive curation, 3Dswap: Knowledgebase of 3D domain swapping in Proteins, unpublished data, 315 PDB entries with 344 chains were obtained for the positive dataset. These chains were further mapped to their respective SCOP26 folds. To curate the negative dataset, we scanned different databases (PDB, PQS, and PDBSum) for dimers or higher order oligomers that are not included in positive dataset. PDB was scanned for oligomers that are not reported to be involved in domain swapping. The negative dataset was generated after excluding the SCOP folds reported in the positive dataset. To add diversity to the negative dataset, members from a single SCOP fold was represented only once in the negative dataset. The redundant entries were removed by considering their sequence identity. Sequences extracted from structures that have >70% sequence identity were removed using the CD-HIT program.27 We retained 213 domain swap sequences for the positive dataset. Equal number of negative data was obtained from the Protein Data Bank. The training dataset was constructed using 150 domain swapping and 150 non-domain swapping sequences. Remaining 63 domain swap sequences and 63 non-domain swapping sequences were employed for testing. Schematic representation of data curation steps, followed to generate positive and negative data, are given in Figure 2.


Insights into Protein Sequence and Structure-Derived Features Mediating 3D Domain Swapping Mechanism using Support Vector Machine Based Approach.

Shameer K, Pugalenthi G, Kandaswamy KK, Suganthan PN, Archunan G, Sowdhamini R - Bioinform Biol Insights (2010)

Schematic representation of data curation steps.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2901629&req=5

f2-bbi-2010-033: Schematic representation of data curation steps.
Mentions: We have performed extensive database and literature curation to collect sequence and structural data for proteins with the structural features of domain swapping. We have collected a set of PDB structures from Protein Data Bank (PDB)20 using a combination of integrative database searches and extensive literature curation of the existence and extent of 3D domain swapping. These entries were further manually analyzed using combination of macromolecular visualization tools PyMol,21 Rasmol22 and literature reports. The structural entries were further processed using Domain Identification ALgorithm (DIAL) server23 to identify probable swapped segments from the structural data. PDB ID to PubMed ID mapping and PDBSum database24 were used to obtain primary literature reports. Since many structures are not available in quaternary state from the PDB, Protein Quaternary Structure server (PQS)25 was consulted to obtain the quaternary assembly of the structures. From the extensive curation, 3Dswap: Knowledgebase of 3D domain swapping in Proteins, unpublished data, 315 PDB entries with 344 chains were obtained for the positive dataset. These chains were further mapped to their respective SCOP26 folds. To curate the negative dataset, we scanned different databases (PDB, PQS, and PDBSum) for dimers or higher order oligomers that are not included in positive dataset. PDB was scanned for oligomers that are not reported to be involved in domain swapping. The negative dataset was generated after excluding the SCOP folds reported in the positive dataset. To add diversity to the negative dataset, members from a single SCOP fold was represented only once in the negative dataset. The redundant entries were removed by considering their sequence identity. Sequences extracted from structures that have >70% sequence identity were removed using the CD-HIT program.27 We retained 213 domain swap sequences for the positive dataset. Equal number of negative data was obtained from the Protein Data Bank. The training dataset was constructed using 150 domain swapping and 150 non-domain swapping sequences. Remaining 63 domain swap sequences and 63 non-domain swapping sequences were employed for testing. Schematic representation of data curation steps, followed to generate positive and negative data, are given in Figure 2.

Bottom Line: We obtained 76.33% accuracy from training and 73.81% accuracy from testing.Due to high diversity in the sequence, structure and functions of proteins involved in domain swapping, availability of such an algorithm to predict swapping events from sequence and structure-derived features will be an initial step towards identification of more putative proteins that may be involved in swapping or proteins involved in deposition disease.Further, the top features emerging in our feature selection method may be analysed further to understand their roles in the mechanism of domain swapping.

View Article: PubMed Central - PubMed

Affiliation: National Centre for Biological Sciences (TIFR), GKVK Campus, Bellary Road, Bangalore, 560065, India.

ABSTRACT
3-dimensional domain swapping is a mechanism where two or more protein molecules form higher order oligomers by exchanging identical or similar subunits. Recently, this phenomenon has received much attention in the context of prions and neurodegenerative diseases, due to its role in the functional regulation, formation of higher oligomers, protein misfolding, aggregation etc. While 3-dimensional domain swap mechanism can be detected from three-dimensional structures, it remains a formidable challenge to derive common sequence or structural patterns from proteins involved in swapping. We have developed a SVM-based classifier to predict domain swapping events using a set of features derived from sequence and structural data. The SVM classifier was trained on features derived from 150 proteins reported to be involved in 3D domain swapping and 150 proteins not known to be involved in swapped conformation or related to proteins involved in swapping phenomenon. The testing was performed using 63 proteins from the positive dataset and 63 proteins from the negative dataset. We obtained 76.33% accuracy from training and 73.81% accuracy from testing. Due to high diversity in the sequence, structure and functions of proteins involved in domain swapping, availability of such an algorithm to predict swapping events from sequence and structure-derived features will be an initial step towards identification of more putative proteins that may be involved in swapping or proteins involved in deposition disease. Further, the top features emerging in our feature selection method may be analysed further to understand their roles in the mechanism of domain swapping.

No MeSH data available.


Related in: MedlinePlus