Limits...
Computational prediction of O-linked glycosylation sites that preferentially map on intrinsically disordered regions of extracellular proteins.

Nishikawa I, Nakajima Y, Ito M, Fukuchi S, Homma K, Nishikawa K - Int J Mol Sci (2010)

Bottom Line: O-glycosylated sites were often found clustered along the sequence, whereas other sites were located sporadically.The O-glycosylation sites were preferentially located within intrinsically disordered regions of extracellular proteins: particularly, more than 90% of the clustered O-GalNAc glycosylation sites were observed in intrinsically disordered regions.This feature could be the key for understanding the non-conservation property of O-glycosylation, and its role in functional diversity and structural stability.

View Article: PubMed Central - PubMed

Affiliation: College of Information Science and Engineering, Ritsumeikan University/Noji-higashi 1-1-1, Kusatsu, Shiga 525-8577, Japan; E-Mail: nakajima.yukiko@gmail.com.

ABSTRACT
O-glycosylation of mammalian proteins is one of the important posttranslational modifications. We applied a support vector machine (SVM) to predict whether Ser or Thr is glycosylated, in order to elucidate the O-glycosylation mechanism. O-glycosylated sites were often found clustered along the sequence, whereas other sites were located sporadically. Therefore, we developed two types of SVMs for predicting clustered and isolated sites separately. We found that the amino acid composition was effective for predicting the clustered type, whereas the site-specific algorithm was effective for the isolated type. The highest prediction accuracy for the clustered type was 74%, while that for the isolated type was 79%. The existence frequency of amino acids around the O-glycosylation sites was different in the two types: namely, Pro, Val and Ala had high existence probabilities at each specific position relative to a glycosylation site, especially for the isolated type. Independent component analyses for the amino acid sequences around O-glycosylation sites showed the position-specific existences of the identified amino acids as independent components. The O-glycosylation sites were preferentially located within intrinsically disordered regions of extracellular proteins: particularly, more than 90% of the clustered O-GalNAc glycosylation sites were observed in intrinsically disordered regions. This feature could be the key for understanding the non-conservation property of O-glycosylation, and its role in functional diversity and structural stability.

Show MeSH
Prediction accuracies for the clustered and isolated types of mucin-type O-glycosylation in various sequences varying in length (window size, Ws) from three to 55. Amino acid sequence or composition information was used as the input to SVM. The crosses and circles indicate the prediction accuracies obtained by using the sequence information and composition information, respectively. The clustered and isolated types are shown in red and blue, respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC3100847&req=5

f1-ijms-11-04991: Prediction accuracies for the clustered and isolated types of mucin-type O-glycosylation in various sequences varying in length (window size, Ws) from three to 55. Amino acid sequence or composition information was used as the input to SVM. The crosses and circles indicate the prediction accuracies obtained by using the sequence information and composition information, respectively. The clustered and isolated types are shown in red and blue, respectively.

Mentions: SVM was trained for each clustered or isolated type of mucin-type O-glycosylation separately. The exact definitions of the clustered and isolated types of O-glycosylations are given in Section 4.2. The input to SVM was information on a protein sequence of a fixed length including the prediction target site at the center. Two types of information were used: one was the amino acid sequence encoded by sparse coding, which distinguished all 20 types of amino acids, while the other was the amino acid composition of the sequence. Figure 1 shows the prediction accuracy obtained by using either sequence or composition information as the input to SVM for the clustered or isolated type of O-glycosylation.


Computational prediction of O-linked glycosylation sites that preferentially map on intrinsically disordered regions of extracellular proteins.

Nishikawa I, Nakajima Y, Ito M, Fukuchi S, Homma K, Nishikawa K - Int J Mol Sci (2010)

Prediction accuracies for the clustered and isolated types of mucin-type O-glycosylation in various sequences varying in length (window size, Ws) from three to 55. Amino acid sequence or composition information was used as the input to SVM. The crosses and circles indicate the prediction accuracies obtained by using the sequence information and composition information, respectively. The clustered and isolated types are shown in red and blue, respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC3100847&req=5

f1-ijms-11-04991: Prediction accuracies for the clustered and isolated types of mucin-type O-glycosylation in various sequences varying in length (window size, Ws) from three to 55. Amino acid sequence or composition information was used as the input to SVM. The crosses and circles indicate the prediction accuracies obtained by using the sequence information and composition information, respectively. The clustered and isolated types are shown in red and blue, respectively.
Mentions: SVM was trained for each clustered or isolated type of mucin-type O-glycosylation separately. The exact definitions of the clustered and isolated types of O-glycosylations are given in Section 4.2. The input to SVM was information on a protein sequence of a fixed length including the prediction target site at the center. Two types of information were used: one was the amino acid sequence encoded by sparse coding, which distinguished all 20 types of amino acids, while the other was the amino acid composition of the sequence. Figure 1 shows the prediction accuracy obtained by using either sequence or composition information as the input to SVM for the clustered or isolated type of O-glycosylation.

Bottom Line: O-glycosylated sites were often found clustered along the sequence, whereas other sites were located sporadically.The O-glycosylation sites were preferentially located within intrinsically disordered regions of extracellular proteins: particularly, more than 90% of the clustered O-GalNAc glycosylation sites were observed in intrinsically disordered regions.This feature could be the key for understanding the non-conservation property of O-glycosylation, and its role in functional diversity and structural stability.

View Article: PubMed Central - PubMed

Affiliation: College of Information Science and Engineering, Ritsumeikan University/Noji-higashi 1-1-1, Kusatsu, Shiga 525-8577, Japan; E-Mail: nakajima.yukiko@gmail.com.

ABSTRACT
O-glycosylation of mammalian proteins is one of the important posttranslational modifications. We applied a support vector machine (SVM) to predict whether Ser or Thr is glycosylated, in order to elucidate the O-glycosylation mechanism. O-glycosylated sites were often found clustered along the sequence, whereas other sites were located sporadically. Therefore, we developed two types of SVMs for predicting clustered and isolated sites separately. We found that the amino acid composition was effective for predicting the clustered type, whereas the site-specific algorithm was effective for the isolated type. The highest prediction accuracy for the clustered type was 74%, while that for the isolated type was 79%. The existence frequency of amino acids around the O-glycosylation sites was different in the two types: namely, Pro, Val and Ala had high existence probabilities at each specific position relative to a glycosylation site, especially for the isolated type. Independent component analyses for the amino acid sequences around O-glycosylation sites showed the position-specific existences of the identified amino acids as independent components. The O-glycosylation sites were preferentially located within intrinsically disordered regions of extracellular proteins: particularly, more than 90% of the clustered O-GalNAc glycosylation sites were observed in intrinsically disordered regions. This feature could be the key for understanding the non-conservation property of O-glycosylation, and its role in functional diversity and structural stability.

Show MeSH