Limits...
Functional equivalency inferred from "authoritative sources" in networks of homologous proteins.

Natarajan S, Jakobsson E - PLoS ONE (2009)

Bottom Line: A one-on-one mapping of protein functionality across different species is a critical component of comparative analysis.We verify the functional equivalency of our dataset through a series of tests that include sequence, structure and function comparisons.Comparison is made to the OMA methodology, which also identifies one-on-one mapping between proteins from different species.

View Article: PubMed Central - PubMed

Affiliation: Biophysics and Computational Biology, University of Illinois, Urbana-Champaign, Illinois, USA.

ABSTRACT
A one-on-one mapping of protein functionality across different species is a critical component of comparative analysis. This paper presents a heuristic algorithm for discovering the Most Likely Functional Counterparts (MoLFunCs) of a protein, based on simple concepts from network theory. A key feature of our algorithm is utilization of the user's knowledge to assign high confidence to selected functional identification. We show use of the algorithm to retrieve functional equivalents for 7 membrane proteins, from an exploration of almost 40 genomes form multiple online resources. We verify the functional equivalency of our dataset through a series of tests that include sequence, structure and function comparisons. Comparison is made to the OMA methodology, which also identifies one-on-one mapping between proteins from different species. Based on that comparison, we believe that incorporation of user's knowledge as a key aspect of the technique adds value to purely statistical formal methods.

Show MeSH

Related in: MedlinePlus

Subset sample of the reduced MoLFunC matrix after duplicate resolution and Authority Re-Expansion (see workflow in Figure 2).This figure represents filtering by vote and final resolution of ambiguities by HMMSearch. The first row and column are species names and the second row and column represent protein ids. The column headings represent fully resolved “core” MoLFuncs; i.e. yellow species from Figure 3 plus those green species whose ambiguity was removed and the blue species whose multiple candidates were successfully eliminated by the first three steps in Phase 2 of the workflow. The row indices indicate species and proteins that are questionable because of still-unresolved disagreements with the authority species. The white matrix elements indicate rows that agree with the core MoLFuncs, and are therefore accepted as MoLFuncs. The blue cells indicate that there was residual ambiguity that was resolved by being best hit from HMMSearch. All the species/proteins whose row sum was 6 (gray cells) were accepted into the final set. Beige cells indicate substantial disagreement with the core MoLFuncs; these species/proteins are discarded and do not appear in the final set. The species with the green cells had intermediate level of agreement with the core MoLFunCs, and the ambiguities were not picked as best hit by HMMSearch; this species and protein were thus discarded. The column with all entries as zeros is the original authority; the cells are zero because we never do a reverse BLAST starting from the original authority as the query.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2690840&req=5

pone-0005898-g003: Subset sample of the reduced MoLFunC matrix after duplicate resolution and Authority Re-Expansion (see workflow in Figure 2).This figure represents filtering by vote and final resolution of ambiguities by HMMSearch. The first row and column are species names and the second row and column represent protein ids. The column headings represent fully resolved “core” MoLFuncs; i.e. yellow species from Figure 3 plus those green species whose ambiguity was removed and the blue species whose multiple candidates were successfully eliminated by the first three steps in Phase 2 of the workflow. The row indices indicate species and proteins that are questionable because of still-unresolved disagreements with the authority species. The white matrix elements indicate rows that agree with the core MoLFuncs, and are therefore accepted as MoLFuncs. The blue cells indicate that there was residual ambiguity that was resolved by being best hit from HMMSearch. All the species/proteins whose row sum was 6 (gray cells) were accepted into the final set. Beige cells indicate substantial disagreement with the core MoLFuncs; these species/proteins are discarded and do not appear in the final set. The species with the green cells had intermediate level of agreement with the core MoLFunCs, and the ambiguities were not picked as best hit by HMMSearch; this species and protein were thus discarded. The column with all entries as zeros is the original authority; the cells are zero because we never do a reverse BLAST starting from the original authority as the query.

Mentions: In the next stage of the workflow, we filter out some of the ambiguities, before performing a costly profile search. For this purpose, we extract a subset of the reduced MoLFunC matrix, with the rows corresponding to the most recent non-core sequences, and the columns corresponding to the most recent core set (see Figure 3) The row components found the column components as one of the top ten hits in that species. The value in each matrix element is 1 if the column found the row as the best hit, and 0 otherwise. The sum of each row is interpreted as a measure of how authoritative all the non-core species are with respect to the new core set. For any row whose sum is less than half the maximum; i.e. less than half the number of columns, the sequence corresponding to that row is removed from any further analysis. Figure 3 shows an example of this filtering.


Functional equivalency inferred from "authoritative sources" in networks of homologous proteins.

Natarajan S, Jakobsson E - PLoS ONE (2009)

Subset sample of the reduced MoLFunC matrix after duplicate resolution and Authority Re-Expansion (see workflow in Figure 2).This figure represents filtering by vote and final resolution of ambiguities by HMMSearch. The first row and column are species names and the second row and column represent protein ids. The column headings represent fully resolved “core” MoLFuncs; i.e. yellow species from Figure 3 plus those green species whose ambiguity was removed and the blue species whose multiple candidates were successfully eliminated by the first three steps in Phase 2 of the workflow. The row indices indicate species and proteins that are questionable because of still-unresolved disagreements with the authority species. The white matrix elements indicate rows that agree with the core MoLFuncs, and are therefore accepted as MoLFuncs. The blue cells indicate that there was residual ambiguity that was resolved by being best hit from HMMSearch. All the species/proteins whose row sum was 6 (gray cells) were accepted into the final set. Beige cells indicate substantial disagreement with the core MoLFuncs; these species/proteins are discarded and do not appear in the final set. The species with the green cells had intermediate level of agreement with the core MoLFunCs, and the ambiguities were not picked as best hit by HMMSearch; this species and protein were thus discarded. The column with all entries as zeros is the original authority; the cells are zero because we never do a reverse BLAST starting from the original authority as the query.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2690840&req=5

pone-0005898-g003: Subset sample of the reduced MoLFunC matrix after duplicate resolution and Authority Re-Expansion (see workflow in Figure 2).This figure represents filtering by vote and final resolution of ambiguities by HMMSearch. The first row and column are species names and the second row and column represent protein ids. The column headings represent fully resolved “core” MoLFuncs; i.e. yellow species from Figure 3 plus those green species whose ambiguity was removed and the blue species whose multiple candidates were successfully eliminated by the first three steps in Phase 2 of the workflow. The row indices indicate species and proteins that are questionable because of still-unresolved disagreements with the authority species. The white matrix elements indicate rows that agree with the core MoLFuncs, and are therefore accepted as MoLFuncs. The blue cells indicate that there was residual ambiguity that was resolved by being best hit from HMMSearch. All the species/proteins whose row sum was 6 (gray cells) were accepted into the final set. Beige cells indicate substantial disagreement with the core MoLFuncs; these species/proteins are discarded and do not appear in the final set. The species with the green cells had intermediate level of agreement with the core MoLFunCs, and the ambiguities were not picked as best hit by HMMSearch; this species and protein were thus discarded. The column with all entries as zeros is the original authority; the cells are zero because we never do a reverse BLAST starting from the original authority as the query.
Mentions: In the next stage of the workflow, we filter out some of the ambiguities, before performing a costly profile search. For this purpose, we extract a subset of the reduced MoLFunC matrix, with the rows corresponding to the most recent non-core sequences, and the columns corresponding to the most recent core set (see Figure 3) The row components found the column components as one of the top ten hits in that species. The value in each matrix element is 1 if the column found the row as the best hit, and 0 otherwise. The sum of each row is interpreted as a measure of how authoritative all the non-core species are with respect to the new core set. For any row whose sum is less than half the maximum; i.e. less than half the number of columns, the sequence corresponding to that row is removed from any further analysis. Figure 3 shows an example of this filtering.

Bottom Line: A one-on-one mapping of protein functionality across different species is a critical component of comparative analysis.We verify the functional equivalency of our dataset through a series of tests that include sequence, structure and function comparisons.Comparison is made to the OMA methodology, which also identifies one-on-one mapping between proteins from different species.

View Article: PubMed Central - PubMed

Affiliation: Biophysics and Computational Biology, University of Illinois, Urbana-Champaign, Illinois, USA.

ABSTRACT
A one-on-one mapping of protein functionality across different species is a critical component of comparative analysis. This paper presents a heuristic algorithm for discovering the Most Likely Functional Counterparts (MoLFunCs) of a protein, based on simple concepts from network theory. A key feature of our algorithm is utilization of the user's knowledge to assign high confidence to selected functional identification. We show use of the algorithm to retrieve functional equivalents for 7 membrane proteins, from an exploration of almost 40 genomes form multiple online resources. We verify the functional equivalency of our dataset through a series of tests that include sequence, structure and function comparisons. Comparison is made to the OMA methodology, which also identifies one-on-one mapping between proteins from different species. Based on that comparison, we believe that incorporation of user's knowledge as a key aspect of the technique adds value to purely statistical formal methods.

Show MeSH
Related in: MedlinePlus