Limits...
Automatically extracting functionally equivalent proteins from SwissProt.

McMillan LE, Martin AC - BMC Bioinformatics (2008)

Bottom Line: Our results demonstrate that the vast majority of UniProtKB/Swiss-Prot functional annotations are of high quality, and that FOSTA can interpret annotations successfully.Where FOSTA is not successful, we are able to highlight inconsistencies in UniProtKB/Swiss-Prot annotation.We discuss limitations and possible future extensions to FOSTA, and recommend changes to the UniProtKB/Swiss-Prot format, which would facilitate text-mining of UniProtKB/Swiss-Prot.

View Article: PubMed Central - HTML - PubMed

Affiliation: Research Department of Structural & Molecular Biology, University College London, Gower Street, London WC1E 6BT, UK. mcmillan@biochem.ucl.ac.uk

ABSTRACT

Background: There is a frequent need to obtain sets of functionally equivalent homologous proteins (FEPs) from different species. While it is usually the case that orthology implies functional equivalence, this is not always true; therefore datasets of orthologous proteins are not appropriate. The information relevant to extracting FEPs is contained in databanks such as UniProtKB/Swiss-Prot and a manual analysis of these data allow FEPs to be extracted on a one-off basis. However there has been no resource allowing the easy, automatic extraction of groups of FEPs - for example, all instances of protein C.We have developed FOSTA, an automatically generated database of FEPs annotated as having the same function in UniProtKB/Swiss-Prot which can be used for large-scale analysis. The method builds a candidate list of homologues and filters out functionally diverged proteins on the basis of functional annotations using a simple text mining approach.

Results: Large scale evaluation of our FEP extraction method is difficult as there is no gold-standard dataset against which the method can be benchmarked. However, a manual analysis of five protein families confirmed a high level of performance. A more extensive comparison with two manually verified functional equivalence datasets also demonstrated very good performance.

Conclusion: In summary, FOSTA provides an automated analysis of annotations in UniProtKB/Swiss-Prot to enable groups of proteins already annotated as functionally equivalent, to be extracted. Our results demonstrate that the vast majority of UniProtKB/Swiss-Prot functional annotations are of high quality, and that FOSTA can interpret annotations successfully. Where FOSTA is not successful, we are able to highlight inconsistencies in UniProtKB/Swiss-Prot annotation. Most of these would have presented equal difficulties for manual interpretation of annotations. We discuss limitations and possible future extensions to FOSTA, and recommend changes to the UniProtKB/Swiss-Prot format, which would facilitate text-mining of UniProtKB/Swiss-Prot.

Show MeSH
The FOSTA filtering process: homologues are identified by BLAST-ing against the UniProtKB/Swiss-Prot database (filtering stage (1)); these are then filtered to retain only those with similar function (filtering stage (2)); finally one protein per species (the FEP, or functionally equivalent protein) is chosen using a hierarchy of functional matches to eliminate functionally diverged homologues (FDHs) (filtering stage (3)).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2576269&req=5

Figure 2: The FOSTA filtering process: homologues are identified by BLAST-ing against the UniProtKB/Swiss-Prot database (filtering stage (1)); these are then filtered to retain only those with similar function (filtering stage (2)); finally one protein per species (the FEP, or functionally equivalent protein) is chosen using a hierarchy of functional matches to eliminate functionally diverged homologues (FDHs) (filtering stage (3)).

Mentions: As input, FOSTA takes an entire UniProtKB/Swiss-Prot release; results presented here are based on UniProtKB/Swiss-Prot version 53.0. FOSTA roots families of FEPs (FOSTA families) around human proteins using the three stage filtering processes shown in Figure 2. Candidates rejected at filtering stages (2) and (3) are retained and recorded as functionally diverged homologues (FDHs).


Automatically extracting functionally equivalent proteins from SwissProt.

McMillan LE, Martin AC - BMC Bioinformatics (2008)

The FOSTA filtering process: homologues are identified by BLAST-ing against the UniProtKB/Swiss-Prot database (filtering stage (1)); these are then filtered to retain only those with similar function (filtering stage (2)); finally one protein per species (the FEP, or functionally equivalent protein) is chosen using a hierarchy of functional matches to eliminate functionally diverged homologues (FDHs) (filtering stage (3)).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2576269&req=5

Figure 2: The FOSTA filtering process: homologues are identified by BLAST-ing against the UniProtKB/Swiss-Prot database (filtering stage (1)); these are then filtered to retain only those with similar function (filtering stage (2)); finally one protein per species (the FEP, or functionally equivalent protein) is chosen using a hierarchy of functional matches to eliminate functionally diverged homologues (FDHs) (filtering stage (3)).
Mentions: As input, FOSTA takes an entire UniProtKB/Swiss-Prot release; results presented here are based on UniProtKB/Swiss-Prot version 53.0. FOSTA roots families of FEPs (FOSTA families) around human proteins using the three stage filtering processes shown in Figure 2. Candidates rejected at filtering stages (2) and (3) are retained and recorded as functionally diverged homologues (FDHs).

Bottom Line: Our results demonstrate that the vast majority of UniProtKB/Swiss-Prot functional annotations are of high quality, and that FOSTA can interpret annotations successfully.Where FOSTA is not successful, we are able to highlight inconsistencies in UniProtKB/Swiss-Prot annotation.We discuss limitations and possible future extensions to FOSTA, and recommend changes to the UniProtKB/Swiss-Prot format, which would facilitate text-mining of UniProtKB/Swiss-Prot.

View Article: PubMed Central - HTML - PubMed

Affiliation: Research Department of Structural & Molecular Biology, University College London, Gower Street, London WC1E 6BT, UK. mcmillan@biochem.ucl.ac.uk

ABSTRACT

Background: There is a frequent need to obtain sets of functionally equivalent homologous proteins (FEPs) from different species. While it is usually the case that orthology implies functional equivalence, this is not always true; therefore datasets of orthologous proteins are not appropriate. The information relevant to extracting FEPs is contained in databanks such as UniProtKB/Swiss-Prot and a manual analysis of these data allow FEPs to be extracted on a one-off basis. However there has been no resource allowing the easy, automatic extraction of groups of FEPs - for example, all instances of protein C.We have developed FOSTA, an automatically generated database of FEPs annotated as having the same function in UniProtKB/Swiss-Prot which can be used for large-scale analysis. The method builds a candidate list of homologues and filters out functionally diverged proteins on the basis of functional annotations using a simple text mining approach.

Results: Large scale evaluation of our FEP extraction method is difficult as there is no gold-standard dataset against which the method can be benchmarked. However, a manual analysis of five protein families confirmed a high level of performance. A more extensive comparison with two manually verified functional equivalence datasets also demonstrated very good performance.

Conclusion: In summary, FOSTA provides an automated analysis of annotations in UniProtKB/Swiss-Prot to enable groups of proteins already annotated as functionally equivalent, to be extracted. Our results demonstrate that the vast majority of UniProtKB/Swiss-Prot functional annotations are of high quality, and that FOSTA can interpret annotations successfully. Where FOSTA is not successful, we are able to highlight inconsistencies in UniProtKB/Swiss-Prot annotation. Most of these would have presented equal difficulties for manual interpretation of annotations. We discuss limitations and possible future extensions to FOSTA, and recommend changes to the UniProtKB/Swiss-Prot format, which would facilitate text-mining of UniProtKB/Swiss-Prot.

Show MeSH