Limits...
Archaeal Clusters of Orthologous Genes (arCOGs): An Update and Application for Analysis of Shared Features between Thermococcales, Methanococcales, and Methanobacteriales.

Makarova KS, Wolf YI, Koonin EV - Life (Basel) (2015)

Bottom Line: Assessment of the current archaeal genome annotation in public databases indicates that consistent use of arCOGs can significantly improve the annotation quality.The results of phylogenomic analysis that involved both comparison of multiple phylogenetic trees and a search for putative derived shared characters by using phyletic patterns extracted from the arCOGs reveal a likely evolutionary relationship between the Thermococci, Methanococci, and Methanobacteria.The arCOGs are expected to be instrumental for a comprehensive phylogenomic study of the archaea.

View Article: PubMed Central - PubMed

Affiliation: National Center for Biotechnology Information, NLM, National Institutes of Health, Bethesda, MD 20894, USA. makarova@ncbi.nlm.nih.gov.

ABSTRACT
With the continuously accelerating genome sequencing from diverse groups of archaea and bacteria, accurate identification of gene orthology and availability of readily expandable clusters of orthologous genes are essential for the functional annotation of new genomes. We report an update of the collection of archaeal Clusters of Orthologous Genes (arCOGs) to cover, on average, 91% of the protein-coding genes in 168 archaeal genomes. The new arCOGs were constructed using refined algorithms for orthology identification combined with extensive manual curation, including incorporation of the results of several completed and ongoing research projects in archaeal genomics. A new level of classification is introduced, superclusters that untie two or more arCOGs and more completely reflect gene family evolution than individual, disconnected arCOGs. Assessment of the current archaeal genome annotation in public databases indicates that consistent use of arCOGs can significantly improve the annotation quality. In addition to their utility for genome annotation, arCOGs also are a platform for phylogenomic analysis. We explore this aspect of arCOGs by performing a phylogenomic study of the Thermococci that are traditionally viewed as the basal branch of the Euryarchaeota. The results of phylogenomic analysis that involved both comparison of multiple phylogenetic trees and a search for putative derived shared characters by using phyletic patterns extracted from the arCOGs reveal a likely evolutionary relationship between the Thermococci, Methanococci, and Methanobacteria. The arCOGs are expected to be instrumental for a comprehensive phylogenomic study of the archaea.

No MeSH data available.


Phylogenetic analysis of archaeal TilS/MesJ family. The tree was reconstructed as described in Figure 4; 284 sequences and 285 aligned positions were used. The complete tree is available in Supplementary file S4. Coloring scheme is the same as in Figure 2. Sequences and collapsed branches are shown as in Figure 4.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4390880&req=5

life-05-00818-f005: Phylogenetic analysis of archaeal TilS/MesJ family. The tree was reconstructed as described in Figure 4; 284 sequences and 285 aligned positions were used. The complete tree is available in Supplementary file S4. Coloring scheme is the same as in Figure 2. Sequences and collapsed branches are shown as in Figure 4.

Mentions: The COG00037 supercluster consists of three arCOGs (arCOG00042, arCOG00044, and arCOG00046). All these proteins belong to the TilS/MesJ family of tRNA(Ile)-lysidine synthases, members of the N-type ATP pyrophosphatase superfamily, found in all three domains of life [93], and are involved in an essential modification of tRNA(Ile) [94]. Two of these arCOGs are exclusively shared by Thermococci with class I methanogens. The phylogenetic tree for this supercluster (Figure 5) revealed an even more complicated picture than the enolase tree. Apparently, evolution of this family was affected by a number of evolutionary events including HGT, duplications, and accelerations of evolutionary rate. However, some interpretations appear straightforward. Most arCOG00042 representatives belong to two major branches (branches 1 and 2) that probably evolved via an ancestral duplication, although a scenario with multiple HGT events cannot be ruled out (Figure 5). Methanococci are present in both of these branches but also form three additional branches, often together with representatives of other lineages of class I methanogens that apparently encompass fast-evolving variants of this gene. Branch 3 and branch 4 correspond to arCOG00046 and arCOG00044, respectively. The clustering of Thermococci with Methanococci is observed in three branches (2, 3, and 4) (Figure 5). Under topology I, these observations imply that these three genes were acquired by Thermococci from class I methanogens via three independent HGT events. Alternative hypotheses compatible with topology I would involve a long branch attraction artifact for several paralogs that would have to be assumed to experience acceleration of evolution independently in both Thermococci and class I methanogens or multiple gene losses of 4 (or more) ancestral paralogs in all other branches of archaea. Scenarios based on topology II imply that the duplications occurred in the common ancestor of Thermococci and class I methanogens and gave rise to two or three fast-evolving TilS/MesJ family paralogs. An apparent distant homolog of the TilS/MesJ family, arCOG00045, is also shared between Thermococci and class I methanogens, to the exclusion of other archaea. This arCOG belongs to a different COG (COG01365), and, in addition to Thermococci and class I methanogens, is found only in a few bacterial genomes [38]. The functions of these proteins are not known but most of them are fused to a KH RNA-binding domain [95], implicating these enzymes in RNA modification. This family is likely to represent yet another divergent paralog resulting from duplications in the common ancestor of the same clade (Figure 5).


Archaeal Clusters of Orthologous Genes (arCOGs): An Update and Application for Analysis of Shared Features between Thermococcales, Methanococcales, and Methanobacteriales.

Makarova KS, Wolf YI, Koonin EV - Life (Basel) (2015)

Phylogenetic analysis of archaeal TilS/MesJ family. The tree was reconstructed as described in Figure 4; 284 sequences and 285 aligned positions were used. The complete tree is available in Supplementary file S4. Coloring scheme is the same as in Figure 2. Sequences and collapsed branches are shown as in Figure 4.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4390880&req=5

life-05-00818-f005: Phylogenetic analysis of archaeal TilS/MesJ family. The tree was reconstructed as described in Figure 4; 284 sequences and 285 aligned positions were used. The complete tree is available in Supplementary file S4. Coloring scheme is the same as in Figure 2. Sequences and collapsed branches are shown as in Figure 4.
Mentions: The COG00037 supercluster consists of three arCOGs (arCOG00042, arCOG00044, and arCOG00046). All these proteins belong to the TilS/MesJ family of tRNA(Ile)-lysidine synthases, members of the N-type ATP pyrophosphatase superfamily, found in all three domains of life [93], and are involved in an essential modification of tRNA(Ile) [94]. Two of these arCOGs are exclusively shared by Thermococci with class I methanogens. The phylogenetic tree for this supercluster (Figure 5) revealed an even more complicated picture than the enolase tree. Apparently, evolution of this family was affected by a number of evolutionary events including HGT, duplications, and accelerations of evolutionary rate. However, some interpretations appear straightforward. Most arCOG00042 representatives belong to two major branches (branches 1 and 2) that probably evolved via an ancestral duplication, although a scenario with multiple HGT events cannot be ruled out (Figure 5). Methanococci are present in both of these branches but also form three additional branches, often together with representatives of other lineages of class I methanogens that apparently encompass fast-evolving variants of this gene. Branch 3 and branch 4 correspond to arCOG00046 and arCOG00044, respectively. The clustering of Thermococci with Methanococci is observed in three branches (2, 3, and 4) (Figure 5). Under topology I, these observations imply that these three genes were acquired by Thermococci from class I methanogens via three independent HGT events. Alternative hypotheses compatible with topology I would involve a long branch attraction artifact for several paralogs that would have to be assumed to experience acceleration of evolution independently in both Thermococci and class I methanogens or multiple gene losses of 4 (or more) ancestral paralogs in all other branches of archaea. Scenarios based on topology II imply that the duplications occurred in the common ancestor of Thermococci and class I methanogens and gave rise to two or three fast-evolving TilS/MesJ family paralogs. An apparent distant homolog of the TilS/MesJ family, arCOG00045, is also shared between Thermococci and class I methanogens, to the exclusion of other archaea. This arCOG belongs to a different COG (COG01365), and, in addition to Thermococci and class I methanogens, is found only in a few bacterial genomes [38]. The functions of these proteins are not known but most of them are fused to a KH RNA-binding domain [95], implicating these enzymes in RNA modification. This family is likely to represent yet another divergent paralog resulting from duplications in the common ancestor of the same clade (Figure 5).

Bottom Line: Assessment of the current archaeal genome annotation in public databases indicates that consistent use of arCOGs can significantly improve the annotation quality.The results of phylogenomic analysis that involved both comparison of multiple phylogenetic trees and a search for putative derived shared characters by using phyletic patterns extracted from the arCOGs reveal a likely evolutionary relationship between the Thermococci, Methanococci, and Methanobacteria.The arCOGs are expected to be instrumental for a comprehensive phylogenomic study of the archaea.

View Article: PubMed Central - PubMed

Affiliation: National Center for Biotechnology Information, NLM, National Institutes of Health, Bethesda, MD 20894, USA. makarova@ncbi.nlm.nih.gov.

ABSTRACT
With the continuously accelerating genome sequencing from diverse groups of archaea and bacteria, accurate identification of gene orthology and availability of readily expandable clusters of orthologous genes are essential for the functional annotation of new genomes. We report an update of the collection of archaeal Clusters of Orthologous Genes (arCOGs) to cover, on average, 91% of the protein-coding genes in 168 archaeal genomes. The new arCOGs were constructed using refined algorithms for orthology identification combined with extensive manual curation, including incorporation of the results of several completed and ongoing research projects in archaeal genomics. A new level of classification is introduced, superclusters that untie two or more arCOGs and more completely reflect gene family evolution than individual, disconnected arCOGs. Assessment of the current archaeal genome annotation in public databases indicates that consistent use of arCOGs can significantly improve the annotation quality. In addition to their utility for genome annotation, arCOGs also are a platform for phylogenomic analysis. We explore this aspect of arCOGs by performing a phylogenomic study of the Thermococci that are traditionally viewed as the basal branch of the Euryarchaeota. The results of phylogenomic analysis that involved both comparison of multiple phylogenetic trees and a search for putative derived shared characters by using phyletic patterns extracted from the arCOGs reveal a likely evolutionary relationship between the Thermococci, Methanococci, and Methanobacteria. The arCOGs are expected to be instrumental for a comprehensive phylogenomic study of the archaea.

No MeSH data available.