Limits...
Optimal network alignment with graphlet degree vectors.

Milenković T, Ng WL, Hayes W, Przulj N - Cancer Inform (2010)

Bottom Line: The phylogenetic trees obtained in this way bear a striking resemblance to the ones obtained by sequence alignments.Our method detects topologically similar regions in large networks that are statistically significant.It does this independent of protein sequence or any other information external to network topology.

View Article: PubMed Central - PubMed

Affiliation: Department of Computing, Imperial College London SW7 2AZ, UK.

ABSTRACT
Important biological information is encoded in the topology of biological networks. Comparative analyses of biological networks are proving to be valuable, as they can lead to transfer of knowledge between species and give deeper insights into biological function, disease, and evolution. We introduce a new method that uses the Hungarian algorithm to produce optimal global alignment between two networks using any cost function. We design a cost function based solely on network topology and use it in our network alignment. Our method can be applied to any two networks, not just biological ones, since it is based only on network topology. We use our new method to align protein-protein interaction networks of two eukaryotic species and demonstrate that our alignment exposes large and topologically complex regions of network similarity. At the same time, our alignment is biologically valid, since many of the aligned protein pairs perform the same biological function. From the alignment, we predict function of yet unannotated proteins, many of which we validate in the literature. Also, we apply our method to find topological similarities between metabolic networks of different species and build phylogenetic trees based on our network alignment score. The phylogenetic trees obtained in this way bear a striking resemblance to the ones obtained by sequence alignments. Our method detects topologically similar regions in large networks that are statistically significant. It does this independent of protein sequence or any other information external to network topology.

No MeSH data available.


A) Comparison of the phylogenetic trees for protists obtained by genetic sequence alignments (left), H-GRAAL’s metabolic network alignments (middle), and GRAAL’s metabolic network alignments (right). The following abbreviations are used for species: CHO—Cryptosporidium hominis, DDI—Dictyostelium discoideum, CPV—Cryptosporidium parvum, PFA—Plasmodium falciparum, EHI—Entamoeba histolytica, TAN—Theileria annulata, TPV—Theileria parva; the species are grouped into “Alveolates”, “Entamoeba”, and “Cellular Slime mold” classes. B) Comparison of the phylogenetic trees for fungi obtained by genetic sequence alignments (left), H-GRAAL’s metabolic network alignments (middle), and GRAAL’s metabolic network alignments (right). The following abbreviations are used for species: AGO—Ashbya gossypii (Eremothecium gossypii), CAL—Candida albicans, CNE—Cryptococcus neoformans, ECU—Encephalitozoon cuniculi, SCE—Saccharomyces cerevisiae, SPO—Schizosaccharomyces pombe; the species are grouped into “Ascomycetes”, “Microsporidian”, and “Basidiomycetes” classes.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2901631&req=5

f5-cin-2010-121: A) Comparison of the phylogenetic trees for protists obtained by genetic sequence alignments (left), H-GRAAL’s metabolic network alignments (middle), and GRAAL’s metabolic network alignments (right). The following abbreviations are used for species: CHO—Cryptosporidium hominis, DDI—Dictyostelium discoideum, CPV—Cryptosporidium parvum, PFA—Plasmodium falciparum, EHI—Entamoeba histolytica, TAN—Theileria annulata, TPV—Theileria parva; the species are grouped into “Alveolates”, “Entamoeba”, and “Cellular Slime mold” classes. B) Comparison of the phylogenetic trees for fungi obtained by genetic sequence alignments (left), H-GRAAL’s metabolic network alignments (middle), and GRAAL’s metabolic network alignments (right). The following abbreviations are used for species: AGO—Ashbya gossypii (Eremothecium gossypii), CAL—Candida albicans, CNE—Cryptococcus neoformans, ECU—Encephalitozoon cuniculi, SCE—Saccharomyces cerevisiae, SPO—Schizosaccharomyces pombe; the species are grouped into “Ascomycetes”, “Microsporidian”, and “Basidiomycetes” classes.

Mentions: The phylogenetic tree constructed for protists using ECs produced by our method is very similar to the tree obtained from the literature,65,66 as well as to that produced by GRAAL (see Fig. 5A). Both our and GRAAL’s tree differ from the sequence-based one in α single branch: Plasmodium falciparum (PFA) is misplaced in our tree, whereas Entamoeba histolytica (EHI) (or Dictyostelium discoideum (DDI)) is misplaced in GRAAL’s tree. We can estimate the statistical significance of our tree by measuring how it compares to trees built from random networks of the same size as the metabolic networks (see Methods section and Kuchaiev et al30 for details); we find that the p-value of our tree is less than 1.6 × 10−3. Our H-GRAAL-based phylogeny reconstruction shows that the topologies of entire metabolic networks of Cryptosporidium parvum (CPV) and Cryptosporidium hominis (CHO) are very similar, since these species are grouped together in the tree. Since these organisms are two morphologically identical species of Apicomplexan protozoa with 97% genetic sequence identity, but with strikingly different hosts68 that contribute to their divergence,69,70 this validates our approach.


Optimal network alignment with graphlet degree vectors.

Milenković T, Ng WL, Hayes W, Przulj N - Cancer Inform (2010)

A) Comparison of the phylogenetic trees for protists obtained by genetic sequence alignments (left), H-GRAAL’s metabolic network alignments (middle), and GRAAL’s metabolic network alignments (right). The following abbreviations are used for species: CHO—Cryptosporidium hominis, DDI—Dictyostelium discoideum, CPV—Cryptosporidium parvum, PFA—Plasmodium falciparum, EHI—Entamoeba histolytica, TAN—Theileria annulata, TPV—Theileria parva; the species are grouped into “Alveolates”, “Entamoeba”, and “Cellular Slime mold” classes. B) Comparison of the phylogenetic trees for fungi obtained by genetic sequence alignments (left), H-GRAAL’s metabolic network alignments (middle), and GRAAL’s metabolic network alignments (right). The following abbreviations are used for species: AGO—Ashbya gossypii (Eremothecium gossypii), CAL—Candida albicans, CNE—Cryptococcus neoformans, ECU—Encephalitozoon cuniculi, SCE—Saccharomyces cerevisiae, SPO—Schizosaccharomyces pombe; the species are grouped into “Ascomycetes”, “Microsporidian”, and “Basidiomycetes” classes.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2901631&req=5

f5-cin-2010-121: A) Comparison of the phylogenetic trees for protists obtained by genetic sequence alignments (left), H-GRAAL’s metabolic network alignments (middle), and GRAAL’s metabolic network alignments (right). The following abbreviations are used for species: CHO—Cryptosporidium hominis, DDI—Dictyostelium discoideum, CPV—Cryptosporidium parvum, PFA—Plasmodium falciparum, EHI—Entamoeba histolytica, TAN—Theileria annulata, TPV—Theileria parva; the species are grouped into “Alveolates”, “Entamoeba”, and “Cellular Slime mold” classes. B) Comparison of the phylogenetic trees for fungi obtained by genetic sequence alignments (left), H-GRAAL’s metabolic network alignments (middle), and GRAAL’s metabolic network alignments (right). The following abbreviations are used for species: AGO—Ashbya gossypii (Eremothecium gossypii), CAL—Candida albicans, CNE—Cryptococcus neoformans, ECU—Encephalitozoon cuniculi, SCE—Saccharomyces cerevisiae, SPO—Schizosaccharomyces pombe; the species are grouped into “Ascomycetes”, “Microsporidian”, and “Basidiomycetes” classes.
Mentions: The phylogenetic tree constructed for protists using ECs produced by our method is very similar to the tree obtained from the literature,65,66 as well as to that produced by GRAAL (see Fig. 5A). Both our and GRAAL’s tree differ from the sequence-based one in α single branch: Plasmodium falciparum (PFA) is misplaced in our tree, whereas Entamoeba histolytica (EHI) (or Dictyostelium discoideum (DDI)) is misplaced in GRAAL’s tree. We can estimate the statistical significance of our tree by measuring how it compares to trees built from random networks of the same size as the metabolic networks (see Methods section and Kuchaiev et al30 for details); we find that the p-value of our tree is less than 1.6 × 10−3. Our H-GRAAL-based phylogeny reconstruction shows that the topologies of entire metabolic networks of Cryptosporidium parvum (CPV) and Cryptosporidium hominis (CHO) are very similar, since these species are grouped together in the tree. Since these organisms are two morphologically identical species of Apicomplexan protozoa with 97% genetic sequence identity, but with strikingly different hosts68 that contribute to their divergence,69,70 this validates our approach.

Bottom Line: The phylogenetic trees obtained in this way bear a striking resemblance to the ones obtained by sequence alignments.Our method detects topologically similar regions in large networks that are statistically significant.It does this independent of protein sequence or any other information external to network topology.

View Article: PubMed Central - PubMed

Affiliation: Department of Computing, Imperial College London SW7 2AZ, UK.

ABSTRACT
Important biological information is encoded in the topology of biological networks. Comparative analyses of biological networks are proving to be valuable, as they can lead to transfer of knowledge between species and give deeper insights into biological function, disease, and evolution. We introduce a new method that uses the Hungarian algorithm to produce optimal global alignment between two networks using any cost function. We design a cost function based solely on network topology and use it in our network alignment. Our method can be applied to any two networks, not just biological ones, since it is based only on network topology. We use our new method to align protein-protein interaction networks of two eukaryotic species and demonstrate that our alignment exposes large and topologically complex regions of network similarity. At the same time, our alignment is biologically valid, since many of the aligned protein pairs perform the same biological function. From the alignment, we predict function of yet unannotated proteins, many of which we validate in the literature. Also, we apply our method to find topological similarities between metabolic networks of different species and build phylogenetic trees based on our network alignment score. The phylogenetic trees obtained in this way bear a striking resemblance to the ones obtained by sequence alignments. Our method detects topologically similar regions in large networks that are statistically significant. It does this independent of protein sequence or any other information external to network topology.

No MeSH data available.