Limits...
Projection of gene-protein networks to the functional space of the proteome and its application to analysis of organism complexity.

Kanapin AA, Mulder N, Kuznetsov VA - BMC Genomics (2010)

Bottom Line: We identify and provide characteristics of functional switches in the polyform group of TUs in different organisms.Based on comparison of mouse and human SFNs, a role of alternative splicing as a necessary source of evolution towards more complex organisms is demonstrated.The entire set of FL across many organisms could be used as a draft of the catalogue of the functional space of the proteome world.

View Article: PubMed Central - HTML - PubMed

Affiliation: Ontario Institute for Cancer Research, Toronto, Canada. alexander.kanapin@oicr.on.ca

ABSTRACT

Unlabelled: We consider the problem of biological complexity via a projection of protein-coding genes of complex organisms onto the functional space of the proteome. The latter can be defined as a set of all functions committed by proteins of an organism. Alternative splicing (AS) allows an organism to generate diverse mature RNA transcripts from a single mRNA strand and thus it could be one of the key mechanisms of increasing of functional complexity of the organism's proteome and a driving force of biological evolution. Thus, the projection of transcription units (TU) and alternative splice-variant (SV) forms onto proteome functional space could generate new types of relational networks (e.g. SV-protein function networks, SFN) and lead to discoveries of novel evolutionarily conservative functional modules. Such types of networks might provide new reliable characteristics of organism complexity and a better understanding of the evolutionary integration and plasticity of interconnection of genome-transcriptome-proteome functions.

Results: We use the InterPro and UniProt databases to attribute descriptive features (keywords) to protein sequences. UniProt database includes a controlled and curated vocabulary of specific descriptors or keywords. The keywords have been assigned to a protein sequence via conserved domains or via similarity with annotated sequences. Then we consider the unique combinations of keywords as the protein functional labels (FL), which characterize the biological functions of the given protein and construct the contingency tables and graphs providing the projections of transcription units (TU) and alternative splice-variants (SV) onto all FL of the proteome of a given organism. We constructed SFNs for organisms with different evolutionary history and levels of complexity, and performed detailed statistical parameterization of the networks.

Conclusions: The application of the algorithm to organisms with different evolutionary history and level of biological complexity (nematode, fruit fly, vertebrata) reveals that the parameters describing SFN correlate with the complexity of a given organism. Using statistical analysis of the links of the functional networks, we propose new features of evolution of protein function acquisition. We reveal a group of genes and corresponding functions, which could be attributed to an early conservative part of the cellular machinery essential for cell viability and survival. We identify and provide characteristics of functional switches in the polyform group of TUs in different organisms. Based on comparison of mouse and human SFNs, a role of alternative splicing as a necessary source of evolution towards more complex organisms is demonstrated. The entire set of FL across many organisms could be used as a draft of the catalogue of the functional space of the proteome world.

Show MeSH
Data flow in the Functional Label generation algorithm. The diagram describes a general approach to the FL generation for a given protein sequence (splice variant) via conserved domains and sequence similarity.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2822532&req=5

Figure 6: Data flow in the Functional Label generation algorithm. The diagram describes a general approach to the FL generation for a given protein sequence (splice variant) via conserved domains and sequence similarity.

Mentions: A dataflow diagram of the functional label generation is presented in Figure 6. The functional labels were assigned to protein sequences using combinations of UniProt keyword IDs. We used InterProScan version 4.0 and InterPro database [29] version 18.0. InterProScan provides results only for complete matches for InterPro entries; therefore the result set is free from fragments and partial matches. The proteins were scanned against the SwissProt database using BLAST software and the corresponding keywords were retrieved for all exact matches with SwissProt sequences. All the redundant keyword IDs have been removed from the combination leaving only unique keywords for each protein sequence.


Projection of gene-protein networks to the functional space of the proteome and its application to analysis of organism complexity.

Kanapin AA, Mulder N, Kuznetsov VA - BMC Genomics (2010)

Data flow in the Functional Label generation algorithm. The diagram describes a general approach to the FL generation for a given protein sequence (splice variant) via conserved domains and sequence similarity.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2822532&req=5

Figure 6: Data flow in the Functional Label generation algorithm. The diagram describes a general approach to the FL generation for a given protein sequence (splice variant) via conserved domains and sequence similarity.
Mentions: A dataflow diagram of the functional label generation is presented in Figure 6. The functional labels were assigned to protein sequences using combinations of UniProt keyword IDs. We used InterProScan version 4.0 and InterPro database [29] version 18.0. InterProScan provides results only for complete matches for InterPro entries; therefore the result set is free from fragments and partial matches. The proteins were scanned against the SwissProt database using BLAST software and the corresponding keywords were retrieved for all exact matches with SwissProt sequences. All the redundant keyword IDs have been removed from the combination leaving only unique keywords for each protein sequence.

Bottom Line: We identify and provide characteristics of functional switches in the polyform group of TUs in different organisms.Based on comparison of mouse and human SFNs, a role of alternative splicing as a necessary source of evolution towards more complex organisms is demonstrated.The entire set of FL across many organisms could be used as a draft of the catalogue of the functional space of the proteome world.

View Article: PubMed Central - HTML - PubMed

Affiliation: Ontario Institute for Cancer Research, Toronto, Canada. alexander.kanapin@oicr.on.ca

ABSTRACT

Unlabelled: We consider the problem of biological complexity via a projection of protein-coding genes of complex organisms onto the functional space of the proteome. The latter can be defined as a set of all functions committed by proteins of an organism. Alternative splicing (AS) allows an organism to generate diverse mature RNA transcripts from a single mRNA strand and thus it could be one of the key mechanisms of increasing of functional complexity of the organism's proteome and a driving force of biological evolution. Thus, the projection of transcription units (TU) and alternative splice-variant (SV) forms onto proteome functional space could generate new types of relational networks (e.g. SV-protein function networks, SFN) and lead to discoveries of novel evolutionarily conservative functional modules. Such types of networks might provide new reliable characteristics of organism complexity and a better understanding of the evolutionary integration and plasticity of interconnection of genome-transcriptome-proteome functions.

Results: We use the InterPro and UniProt databases to attribute descriptive features (keywords) to protein sequences. UniProt database includes a controlled and curated vocabulary of specific descriptors or keywords. The keywords have been assigned to a protein sequence via conserved domains or via similarity with annotated sequences. Then we consider the unique combinations of keywords as the protein functional labels (FL), which characterize the biological functions of the given protein and construct the contingency tables and graphs providing the projections of transcription units (TU) and alternative splice-variants (SV) onto all FL of the proteome of a given organism. We constructed SFNs for organisms with different evolutionary history and levels of complexity, and performed detailed statistical parameterization of the networks.

Conclusions: The application of the algorithm to organisms with different evolutionary history and level of biological complexity (nematode, fruit fly, vertebrata) reveals that the parameters describing SFN correlate with the complexity of a given organism. Using statistical analysis of the links of the functional networks, we propose new features of evolution of protein function acquisition. We reveal a group of genes and corresponding functions, which could be attributed to an early conservative part of the cellular machinery essential for cell viability and survival. We identify and provide characteristics of functional switches in the polyform group of TUs in different organisms. Based on comparison of mouse and human SFNs, a role of alternative splicing as a necessary source of evolution towards more complex organisms is demonstrated. The entire set of FL across many organisms could be used as a draft of the catalogue of the functional space of the proteome world.

Show MeSH