Limits...
Functional associations of proteins in entire genomes by means of exhaustive detection of gene fusions.

Enright AJ, Ouzounis CA - Genome Biol. (2001)

Bottom Line: Component pairs were identified by virtue of their similarity to 2,365 multidomain composite proteins.On average, 9% of genes in a given genome appear to code for single-domain, component proteins predicted to be functionally associated.These proteins are detected by an additional 4% of genes that code for fused, composite proteins.

View Article: PubMed Central - HTML - PubMed

Affiliation: Computational Genomics Group, European Bioinformatics Institute, EMBL Cambridge Outstation, Cambridge CB10 1SD, UK.

ABSTRACT

Background: It has recently been shown that the detection of gene fusion events across genomes can be used for predicting functional associations of proteins, including physical interaction or complex formation. To obtain such predictions we have made an exhaustive search for gene fusion events within 24 available completely sequenced genomes.

Results: Each genome was used as a query against the remaining 23 complete genomes to detect gene fusion events. Using an improved, fully automatic protocol, a total of 7,224 single-domain proteins that are components of gene fusions in other genomes were detected, many of which were identified for the first time. The total number of predicted pairwise functional associations is 39,730 for all genomes. Component pairs were identified by virtue of their similarity to 2,365 multidomain composite proteins. We also show for the first time that gene fusion is a complex evolutionary process with a number of contributory factors, including paralogy, genome size and phylogenetic distance. On average, 9% of genes in a given genome appear to code for single-domain, component proteins predicted to be functionally associated. These proteins are detected by an additional 4% of genes that code for fused, composite proteins.

Conclusions: These results provide an exhaustive set of functionally associated genes and also delineate the power of fusion analysis for the prediction of protein interactions.

Show MeSH

Related in: MedlinePlus

Numbers of component and composite proteins relative to genome size. Relative numbers of (a) component and (b) composites per species, as individual cases (blue bars) and protein families (green bars), normalized by total genome size (number of ORFs). Species name abbreviations as in Table 1. Average values per genome are 9% for components and 4% for composites.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC65099&req=5

Figure 4: Numbers of component and composite proteins relative to genome size. Relative numbers of (a) component and (b) composites per species, as individual cases (blue bars) and protein families (green bars), normalized by total genome size (number of ORFs). Species name abbreviations as in Table 1. Average values per genome are 9% for components and 4% for composites.

Mentions: Evidently, the number of component and composite proteins detected in each species is also dependent on genome size (Figure 4). When the above numbers for unique cases and families of components (Figure 4a) and composites (Figure 4b) are normalized by the number of open reading frames (ORFs) for the species examined, the patterns of distribution are significantly altered. For instance, Aquifex aeolicus and Thermotoga maritima appear to have a large number of components involved in gene fusion (more than 12% of their genes are involved in this process) (Figure 4a), whereas the absolute numbers are low (Figure 3a). This is also the case for composites, where, for example, S. cerevisiae yields as many cases as D. melanogaster in relative terms (4% of the genome) (Figure 4b), while the absolute counts are dramatically different (Figure 3b).


Functional associations of proteins in entire genomes by means of exhaustive detection of gene fusions.

Enright AJ, Ouzounis CA - Genome Biol. (2001)

Numbers of component and composite proteins relative to genome size. Relative numbers of (a) component and (b) composites per species, as individual cases (blue bars) and protein families (green bars), normalized by total genome size (number of ORFs). Species name abbreviations as in Table 1. Average values per genome are 9% for components and 4% for composites.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC65099&req=5

Figure 4: Numbers of component and composite proteins relative to genome size. Relative numbers of (a) component and (b) composites per species, as individual cases (blue bars) and protein families (green bars), normalized by total genome size (number of ORFs). Species name abbreviations as in Table 1. Average values per genome are 9% for components and 4% for composites.
Mentions: Evidently, the number of component and composite proteins detected in each species is also dependent on genome size (Figure 4). When the above numbers for unique cases and families of components (Figure 4a) and composites (Figure 4b) are normalized by the number of open reading frames (ORFs) for the species examined, the patterns of distribution are significantly altered. For instance, Aquifex aeolicus and Thermotoga maritima appear to have a large number of components involved in gene fusion (more than 12% of their genes are involved in this process) (Figure 4a), whereas the absolute numbers are low (Figure 3a). This is also the case for composites, where, for example, S. cerevisiae yields as many cases as D. melanogaster in relative terms (4% of the genome) (Figure 4b), while the absolute counts are dramatically different (Figure 3b).

Bottom Line: Component pairs were identified by virtue of their similarity to 2,365 multidomain composite proteins.On average, 9% of genes in a given genome appear to code for single-domain, component proteins predicted to be functionally associated.These proteins are detected by an additional 4% of genes that code for fused, composite proteins.

View Article: PubMed Central - HTML - PubMed

Affiliation: Computational Genomics Group, European Bioinformatics Institute, EMBL Cambridge Outstation, Cambridge CB10 1SD, UK.

ABSTRACT

Background: It has recently been shown that the detection of gene fusion events across genomes can be used for predicting functional associations of proteins, including physical interaction or complex formation. To obtain such predictions we have made an exhaustive search for gene fusion events within 24 available completely sequenced genomes.

Results: Each genome was used as a query against the remaining 23 complete genomes to detect gene fusion events. Using an improved, fully automatic protocol, a total of 7,224 single-domain proteins that are components of gene fusions in other genomes were detected, many of which were identified for the first time. The total number of predicted pairwise functional associations is 39,730 for all genomes. Component pairs were identified by virtue of their similarity to 2,365 multidomain composite proteins. We also show for the first time that gene fusion is a complex evolutionary process with a number of contributory factors, including paralogy, genome size and phylogenetic distance. On average, 9% of genes in a given genome appear to code for single-domain, component proteins predicted to be functionally associated. These proteins are detected by an additional 4% of genes that code for fused, composite proteins.

Conclusions: These results provide an exhaustive set of functionally associated genes and also delineate the power of fusion analysis for the prediction of protein interactions.

Show MeSH
Related in: MedlinePlus