Limits...
Eukaryotic large nucleo-cytoplasmic DNA viruses: clusters of orthologous genes and reconstruction of viral genome evolution.

Yutin N, Wolf YI, Raoult D, Koonin EV - Virol. J. (2009)

Bottom Line: A comprehensive comparison of the protein sequences encoded in the genomes of 45 NCLDV belonging to 6 families was performed in order to delineate cluster of orthologous viral genes.The NCVOGs were manually curated and annotated and can be used as a computational platform for functional annotation and evolutionary analysis of new NCLDV genomes.The NCVOGs are a flexible and expandable platform for genome analysis and functional annotation of newly characterized NCLDV.

View Article: PubMed Central - HTML - PubMed

Affiliation: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA. yutin@ncbi.nlm.nih.gov

ABSTRACT

Background: The Nucleo-Cytoplasmic Large DNA Viruses (NCLDV) comprise an apparently monophyletic class of viruses that infect a broad variety of eukaryotic hosts. Recent progress in isolation of new viruses and genome sequencing resulted in a substantial expansion of the NCLDV diversity, resulting in additional opportunities for comparative genomic analysis, and a demand for a comprehensive classification of viral genes.

Results: A comprehensive comparison of the protein sequences encoded in the genomes of 45 NCLDV belonging to 6 families was performed in order to delineate cluster of orthologous viral genes. Using previously developed computational methods for orthology identification, 1445 Nucleo-Cytoplasmic Virus Orthologous Groups (NCVOGs) were identified of which 177 are represented in more than one NCLDV family. The NCVOGs were manually curated and annotated and can be used as a computational platform for functional annotation and evolutionary analysis of new NCLDV genomes. A maximum-likelihood reconstruction of the NCLDV evolution yielded a set of 47 conserved genes that were probably present in the genome of the common ancestor of this class of eukaryotic viruses. This reconstructed ancestral gene set is robust to the parameters of the reconstruction procedure and so is likely to accurately reflect the gene core of the ancestral NCLDV, indicating that this virus encoded a complex machinery of replication, expression and morphogenesis that made it relatively independent from host cell functions.

Conclusions: The NCVOGs are a flexible and expandable platform for genome analysis and functional annotation of newly characterized NCLDV. Evolutionary reconstructions employing NCVOGs point to complex ancestral viruses.

Show MeSH

Related in: MedlinePlus

Distribution of the number of NCLDV families represented in NCVOGs.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2806869&req=5

Figure 1: Distribution of the number of NCLDV families represented in NCVOGs.

Mentions: In this works, we analyzed the annotated proteins encoded in 45 NCLDV proteomes from 6 viral families (Tables 1 and Additional file 1). These viral proteins were partitioned into clusters of likely orthologs using a modified COG procedure (Ref. [30]; see Methods for details). All clusters were manually edited and annotated using the results of RPS-BLAST and PSI-BLAST searches for the constituent proteins. Of the 11,468 (predicted) proteins encoded in the 45 NCLDV genomes, 9,261 were included into 1,445 clusters of probable orthologs (NCVOGs). The overwhelming majority of the NCVOGs (1,268) are family-specific (that is, include proteins from viruses of only one family) whereas the remaining 177 NCVOGs included proteins from two or more NCLDV families (Figure 1). The distribution of the NCVOGs by the number of viral species showed a qualitatively similar pattern where the most abundant class included two species (thanks to closely related pairs of viruses with very large genomes such as the mimivirus and the mamavirus) and additional peaks corresponded to large viral families such as Poxviridae or Phycodnaviridae with 6 (selected) representatives (Figure 2).


Eukaryotic large nucleo-cytoplasmic DNA viruses: clusters of orthologous genes and reconstruction of viral genome evolution.

Yutin N, Wolf YI, Raoult D, Koonin EV - Virol. J. (2009)

Distribution of the number of NCLDV families represented in NCVOGs.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2806869&req=5

Figure 1: Distribution of the number of NCLDV families represented in NCVOGs.
Mentions: In this works, we analyzed the annotated proteins encoded in 45 NCLDV proteomes from 6 viral families (Tables 1 and Additional file 1). These viral proteins were partitioned into clusters of likely orthologs using a modified COG procedure (Ref. [30]; see Methods for details). All clusters were manually edited and annotated using the results of RPS-BLAST and PSI-BLAST searches for the constituent proteins. Of the 11,468 (predicted) proteins encoded in the 45 NCLDV genomes, 9,261 were included into 1,445 clusters of probable orthologs (NCVOGs). The overwhelming majority of the NCVOGs (1,268) are family-specific (that is, include proteins from viruses of only one family) whereas the remaining 177 NCVOGs included proteins from two or more NCLDV families (Figure 1). The distribution of the NCVOGs by the number of viral species showed a qualitatively similar pattern where the most abundant class included two species (thanks to closely related pairs of viruses with very large genomes such as the mimivirus and the mamavirus) and additional peaks corresponded to large viral families such as Poxviridae or Phycodnaviridae with 6 (selected) representatives (Figure 2).

Bottom Line: A comprehensive comparison of the protein sequences encoded in the genomes of 45 NCLDV belonging to 6 families was performed in order to delineate cluster of orthologous viral genes.The NCVOGs were manually curated and annotated and can be used as a computational platform for functional annotation and evolutionary analysis of new NCLDV genomes.The NCVOGs are a flexible and expandable platform for genome analysis and functional annotation of newly characterized NCLDV.

View Article: PubMed Central - HTML - PubMed

Affiliation: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA. yutin@ncbi.nlm.nih.gov

ABSTRACT

Background: The Nucleo-Cytoplasmic Large DNA Viruses (NCLDV) comprise an apparently monophyletic class of viruses that infect a broad variety of eukaryotic hosts. Recent progress in isolation of new viruses and genome sequencing resulted in a substantial expansion of the NCLDV diversity, resulting in additional opportunities for comparative genomic analysis, and a demand for a comprehensive classification of viral genes.

Results: A comprehensive comparison of the protein sequences encoded in the genomes of 45 NCLDV belonging to 6 families was performed in order to delineate cluster of orthologous viral genes. Using previously developed computational methods for orthology identification, 1445 Nucleo-Cytoplasmic Virus Orthologous Groups (NCVOGs) were identified of which 177 are represented in more than one NCLDV family. The NCVOGs were manually curated and annotated and can be used as a computational platform for functional annotation and evolutionary analysis of new NCLDV genomes. A maximum-likelihood reconstruction of the NCLDV evolution yielded a set of 47 conserved genes that were probably present in the genome of the common ancestor of this class of eukaryotic viruses. This reconstructed ancestral gene set is robust to the parameters of the reconstruction procedure and so is likely to accurately reflect the gene core of the ancestral NCLDV, indicating that this virus encoded a complex machinery of replication, expression and morphogenesis that made it relatively independent from host cell functions.

Conclusions: The NCVOGs are a flexible and expandable platform for genome analysis and functional annotation of newly characterized NCLDV. Evolutionary reconstructions employing NCVOGs point to complex ancestral viruses.

Show MeSH
Related in: MedlinePlus