Limits...
De-novo protein function prediction using DNA binding and RNA binding proteins as a test case

View Article: PubMed Central - PubMed

ABSTRACT

Of the currently identified protein sequences, 99.6% have never been observed in the laboratory as proteins and their molecular function has not been established experimentally. Predicting the function of such proteins relies mostly on annotated homologs. However, this has resulted in some erroneous annotations, and many proteins have no annotated homologs. Here we propose a de-novo function prediction approach based on identifying biophysical features that underlie function. Using our approach, we discover DNA and RNA binding proteins that cannot be identified based on homology and validate these predictions experimentally. For example, FGF14, which belongs to a family of secreted growth factors was predicted to bind DNA. We verify this experimentally and also show that FGF14 is localized to the nucleus. Mutating the predicted binding site on FGF14 abrogated DNA binding. These results demonstrate the feasibility of automated de-novo function prediction based on identifying function-related biophysical features.

No MeSH data available.


Predictions of different methods for a set of proteins with no homology to known DNA binders.Cyan diamonds represent proteins annotated as DNA-binding that have no homology to any other known DBPs. Magenta diamonds represent proteins from the negative set of proteins that are unlikely to bind DNA. On the Y-axis is the score of the prediction method. Proteins are ordered by their score. Some methods provide binary predictions. Dr PIP fully separates positive and negative samples, with all binders with the higher scores and all the non-binders with lower scores.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5121330&req=5

f5: Predictions of different methods for a set of proteins with no homology to known DNA binders.Cyan diamonds represent proteins annotated as DNA-binding that have no homology to any other known DBPs. Magenta diamonds represent proteins from the negative set of proteins that are unlikely to bind DNA. On the Y-axis is the score of the prediction method. Proteins are ordered by their score. Some methods provide binary predictions. Dr PIP fully separates positive and negative samples, with all binders with the higher scores and all the non-binders with lower scores.

Mentions: The advantage of the approach proposed here is its potential for de-novo prediction. To demonstrate this, we constructed a set of proteins that were shown experimentally to bind DNA but have no homologs that are known to bind DNA. These proteins can test the de-novo predictions as other methods that rely on homology will not be able to identify them as DBPs. We identified ORFan proteins (i.e., proteins with no known homologues) whose function was studied experimentally. Some of them were shown to bind DNA and others were not. We then submitted each of these proteins to Dr PIP and to other publically available tools for predicting DNA binding proteins252627282930. The results of this comparison are shown in Fig. 5 (detailed list of proteins and prediction scores is in Supplementary Table 4). Dr PIP was the only method that comprehensively distinguished between DNA binding and non-binding proteins. That is, all the DBPs got higher scores that any of non-DBPs.


De-novo protein function prediction using DNA binding and RNA binding proteins as a test case
Predictions of different methods for a set of proteins with no homology to known DNA binders.Cyan diamonds represent proteins annotated as DNA-binding that have no homology to any other known DBPs. Magenta diamonds represent proteins from the negative set of proteins that are unlikely to bind DNA. On the Y-axis is the score of the prediction method. Proteins are ordered by their score. Some methods provide binary predictions. Dr PIP fully separates positive and negative samples, with all binders with the higher scores and all the non-binders with lower scores.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5121330&req=5

f5: Predictions of different methods for a set of proteins with no homology to known DNA binders.Cyan diamonds represent proteins annotated as DNA-binding that have no homology to any other known DBPs. Magenta diamonds represent proteins from the negative set of proteins that are unlikely to bind DNA. On the Y-axis is the score of the prediction method. Proteins are ordered by their score. Some methods provide binary predictions. Dr PIP fully separates positive and negative samples, with all binders with the higher scores and all the non-binders with lower scores.
Mentions: The advantage of the approach proposed here is its potential for de-novo prediction. To demonstrate this, we constructed a set of proteins that were shown experimentally to bind DNA but have no homologs that are known to bind DNA. These proteins can test the de-novo predictions as other methods that rely on homology will not be able to identify them as DBPs. We identified ORFan proteins (i.e., proteins with no known homologues) whose function was studied experimentally. Some of them were shown to bind DNA and others were not. We then submitted each of these proteins to Dr PIP and to other publically available tools for predicting DNA binding proteins252627282930. The results of this comparison are shown in Fig. 5 (detailed list of proteins and prediction scores is in Supplementary Table 4). Dr PIP was the only method that comprehensively distinguished between DNA binding and non-binding proteins. That is, all the DBPs got higher scores that any of non-DBPs.

View Article: PubMed Central - PubMed

ABSTRACT

Of the currently identified protein sequences, 99.6% have never been observed in the laboratory as proteins and their molecular function has not been established experimentally. Predicting the function of such proteins relies mostly on annotated homologs. However, this has resulted in some erroneous annotations, and many proteins have no annotated homologs. Here we propose a de-novo function prediction approach based on identifying biophysical features that underlie function. Using our approach, we discover DNA and RNA binding proteins that cannot be identified based on homology and validate these predictions experimentally. For example, FGF14, which belongs to a family of secreted growth factors was predicted to bind DNA. We verify this experimentally and also show that FGF14 is localized to the nucleus. Mutating the predicted binding site on FGF14 abrogated DNA binding. These results demonstrate the feasibility of automated de-novo function prediction based on identifying function-related biophysical features.

No MeSH data available.