Limits...
Databases of homologous gene families for comparative genomics.

Penel S, Arigon AM, Dufayard JF, Sertier AS, Daubin V, Duret L, Gouy M, Perrière G - BMC Bioinformatics (2009)

Bottom Line: HOVERGEN contains sequences from vertebrates, HOGENOM is mainly devoted to completely sequenced microbial organisms, and HOMOLENS is devoted to metazoan genomes from Ensembl.Access to the databases is provided through Web query forms, a general retrieval system and a client-server graphical interface.The later can be used to perform tree-pattern based searches allowing, among other uses, to retrieve sets of orthologous genes.

View Article: PubMed Central - HTML - PubMed

Affiliation: Laboratoire de Biométrie et Biologie Evolutive, CNRS, Université Claude Bernard - Lyon 1, 43 bd, du 11 Novembre 1918, 69622 Villeurbanne Cedex, France. penel@biomserv.univ-lyon1.fr

ABSTRACT

Background: Comparative genomics is a central step in many sequence analysis studies, from gene annotation and the identification of new functional regions in genomes, to the study of evolutionary processes at the molecular level (speciation, single gene or whole genome duplications, etc.) and phylogenetics. In that context, databases providing users high quality homologous families and sequence alignments as well as phylogenetic trees based on state of the art algorithms are becoming indispensable.

Methods: We developed an automated procedure allowing massive all-against-all similarity searches, gene clustering, multiple alignments computation, and phylogenetic trees construction and reconciliation. The application of this procedure to a very large set of sequences is possible through parallel computing on a large computer cluster.

Results: Three databases were developed using this procedure: HOVERGEN, HOGENOM and HOMOLENS. These databases share the same architecture but differ in their content. HOVERGEN contains sequences from vertebrates, HOGENOM is mainly devoted to completely sequenced microbial organisms, and HOMOLENS is devoted to metazoan genomes from Ensembl. Access to the databases is provided through Web query forms, a general retrieval system and a client-server graphical interface. The later can be used to perform tree-pattern based searches allowing, among other uses, to retrieve sets of orthologous genes. The three databases, as well as the software required to build and query them, can be used or downloaded from the PBIL (Pôle Bioinformatique Lyonnais) site at http://pbil.univ-lyon1.fr/.

Show MeSH
Exemple of trees containing anomalous patterns involving eukaryotes and bacteria. A search on the pattern shown in Figure 4 has been performed on HOGENOM release 4, and this search returned a total of 1,304 families. Two trees taken among the 1,304 are shown in this figure. Family HBG082165 (a) corresponds to a conserved hypothetical protein, and it shows a S. cerevisiae sequence among Lactobacillales species. Family HBG459980 (b) corresponds to the 3-phosphoshikimate 1-carboxyvinyltransferase enzyme, and it shows a G. gallus sequence among Proteobacteria species. Values of the aLRT test are given for the internal branches, and only values with a P > 80% are shown.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2697650&req=5

Figure 5: Exemple of trees containing anomalous patterns involving eukaryotes and bacteria. A search on the pattern shown in Figure 4 has been performed on HOGENOM release 4, and this search returned a total of 1,304 families. Two trees taken among the 1,304 are shown in this figure. Family HBG082165 (a) corresponds to a conserved hypothetical protein, and it shows a S. cerevisiae sequence among Lactobacillales species. Family HBG459980 (b) corresponds to the 3-phosphoshikimate 1-carboxyvinyltransferase enzyme, and it shows a G. gallus sequence among Proteobacteria species. Values of the aLRT test are given for the internal branches, and only values with a P > 80% are shown.

Mentions: A possible example of search of this kind is summarized in Figures 4 and 5. In this search, the pattern entered allows to detect families in which an eukaryotic species is placed within a clade of bacterial species (Figure 4). When performed on the release 4 of HOGENOM (February 2008), this search returns 1,304 trees, two of which are shown in Figure 5. Many of these patterns represent probable contaminations rather than real HGTs, an example of this being the presence of Gallus gallus among Proteobacteria sequences in HBG459980 family. More plausible is the case of family HBG082165 that shows a possible HGT of a gene encoding an hypothetical protein from a Lactobacillales species to the yeast Saccharomyces cerevisiae.


Databases of homologous gene families for comparative genomics.

Penel S, Arigon AM, Dufayard JF, Sertier AS, Daubin V, Duret L, Gouy M, Perrière G - BMC Bioinformatics (2009)

Exemple of trees containing anomalous patterns involving eukaryotes and bacteria. A search on the pattern shown in Figure 4 has been performed on HOGENOM release 4, and this search returned a total of 1,304 families. Two trees taken among the 1,304 are shown in this figure. Family HBG082165 (a) corresponds to a conserved hypothetical protein, and it shows a S. cerevisiae sequence among Lactobacillales species. Family HBG459980 (b) corresponds to the 3-phosphoshikimate 1-carboxyvinyltransferase enzyme, and it shows a G. gallus sequence among Proteobacteria species. Values of the aLRT test are given for the internal branches, and only values with a P > 80% are shown.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2697650&req=5

Figure 5: Exemple of trees containing anomalous patterns involving eukaryotes and bacteria. A search on the pattern shown in Figure 4 has been performed on HOGENOM release 4, and this search returned a total of 1,304 families. Two trees taken among the 1,304 are shown in this figure. Family HBG082165 (a) corresponds to a conserved hypothetical protein, and it shows a S. cerevisiae sequence among Lactobacillales species. Family HBG459980 (b) corresponds to the 3-phosphoshikimate 1-carboxyvinyltransferase enzyme, and it shows a G. gallus sequence among Proteobacteria species. Values of the aLRT test are given for the internal branches, and only values with a P > 80% are shown.
Mentions: A possible example of search of this kind is summarized in Figures 4 and 5. In this search, the pattern entered allows to detect families in which an eukaryotic species is placed within a clade of bacterial species (Figure 4). When performed on the release 4 of HOGENOM (February 2008), this search returns 1,304 trees, two of which are shown in Figure 5. Many of these patterns represent probable contaminations rather than real HGTs, an example of this being the presence of Gallus gallus among Proteobacteria sequences in HBG459980 family. More plausible is the case of family HBG082165 that shows a possible HGT of a gene encoding an hypothetical protein from a Lactobacillales species to the yeast Saccharomyces cerevisiae.

Bottom Line: HOVERGEN contains sequences from vertebrates, HOGENOM is mainly devoted to completely sequenced microbial organisms, and HOMOLENS is devoted to metazoan genomes from Ensembl.Access to the databases is provided through Web query forms, a general retrieval system and a client-server graphical interface.The later can be used to perform tree-pattern based searches allowing, among other uses, to retrieve sets of orthologous genes.

View Article: PubMed Central - HTML - PubMed

Affiliation: Laboratoire de Biométrie et Biologie Evolutive, CNRS, Université Claude Bernard - Lyon 1, 43 bd, du 11 Novembre 1918, 69622 Villeurbanne Cedex, France. penel@biomserv.univ-lyon1.fr

ABSTRACT

Background: Comparative genomics is a central step in many sequence analysis studies, from gene annotation and the identification of new functional regions in genomes, to the study of evolutionary processes at the molecular level (speciation, single gene or whole genome duplications, etc.) and phylogenetics. In that context, databases providing users high quality homologous families and sequence alignments as well as phylogenetic trees based on state of the art algorithms are becoming indispensable.

Methods: We developed an automated procedure allowing massive all-against-all similarity searches, gene clustering, multiple alignments computation, and phylogenetic trees construction and reconciliation. The application of this procedure to a very large set of sequences is possible through parallel computing on a large computer cluster.

Results: Three databases were developed using this procedure: HOVERGEN, HOGENOM and HOMOLENS. These databases share the same architecture but differ in their content. HOVERGEN contains sequences from vertebrates, HOGENOM is mainly devoted to completely sequenced microbial organisms, and HOMOLENS is devoted to metazoan genomes from Ensembl. Access to the databases is provided through Web query forms, a general retrieval system and a client-server graphical interface. The later can be used to perform tree-pattern based searches allowing, among other uses, to retrieve sets of orthologous genes. The three databases, as well as the software required to build and query them, can be used or downloaded from the PBIL (Pôle Bioinformatique Lyonnais) site at http://pbil.univ-lyon1.fr/.

Show MeSH