Limits...
Eukaryotic large nucleo-cytoplasmic DNA viruses: clusters of orthologous genes and reconstruction of viral genome evolution.

Yutin N, Wolf YI, Raoult D, Koonin EV - Virol. J. (2009)

Bottom Line: A comprehensive comparison of the protein sequences encoded in the genomes of 45 NCLDV belonging to 6 families was performed in order to delineate cluster of orthologous viral genes.The NCVOGs were manually curated and annotated and can be used as a computational platform for functional annotation and evolutionary analysis of new NCLDV genomes.The NCVOGs are a flexible and expandable platform for genome analysis and functional annotation of newly characterized NCLDV.

View Article: PubMed Central - HTML - PubMed

Affiliation: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA. yutin@ncbi.nlm.nih.gov

ABSTRACT

Background: The Nucleo-Cytoplasmic Large DNA Viruses (NCLDV) comprise an apparently monophyletic class of viruses that infect a broad variety of eukaryotic hosts. Recent progress in isolation of new viruses and genome sequencing resulted in a substantial expansion of the NCLDV diversity, resulting in additional opportunities for comparative genomic analysis, and a demand for a comprehensive classification of viral genes.

Results: A comprehensive comparison of the protein sequences encoded in the genomes of 45 NCLDV belonging to 6 families was performed in order to delineate cluster of orthologous viral genes. Using previously developed computational methods for orthology identification, 1445 Nucleo-Cytoplasmic Virus Orthologous Groups (NCVOGs) were identified of which 177 are represented in more than one NCLDV family. The NCVOGs were manually curated and annotated and can be used as a computational platform for functional annotation and evolutionary analysis of new NCLDV genomes. A maximum-likelihood reconstruction of the NCLDV evolution yielded a set of 47 conserved genes that were probably present in the genome of the common ancestor of this class of eukaryotic viruses. This reconstructed ancestral gene set is robust to the parameters of the reconstruction procedure and so is likely to accurately reflect the gene core of the ancestral NCLDV, indicating that this virus encoded a complex machinery of replication, expression and morphogenesis that made it relatively independent from host cell functions.

Conclusions: The NCVOGs are a flexible and expandable platform for genome analysis and functional annotation of newly characterized NCLDV. Evolutionary reconstructions employing NCVOGs point to complex ancestral viruses.

Show MeSH

Related in: MedlinePlus

The consensus phylogenetic tree of the NCLDV. The Expected Likelihood Weights (1,000 replications) are indicated for each ancestral node as percentage points. The topology of the tree was derived as the consensus of the tree topologies for the following 10 (nearly) universal NCVOGs: Superfamily II helicase (NCVOG0076), A2L-like transcription factor (NCVOG0262), RNA polymerase α subunit (NCVOG0274), RNA polymerase β subunit (NCVOG0271), mRNA capping enzyme, A32-like packaging ATPase (NCVOG0249), small subunit of ribonucleotide reductase (NCVOG0276), Myristylated envelope protein (NCVOG0211), primase-helicase (NCVOG0023), and DNA polymerase (NCVOG0038) (See Additional File 2). The branch lengths and ELW values (shown as percentage points) are from a tree that was constructed from a concatenated alignment of 4 universal proteins (primase-helicase, DNA polymerase, packaging ATPase, and A2L-like transcription factor).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2806869&req=5

Figure 7: The consensus phylogenetic tree of the NCLDV. The Expected Likelihood Weights (1,000 replications) are indicated for each ancestral node as percentage points. The topology of the tree was derived as the consensus of the tree topologies for the following 10 (nearly) universal NCVOGs: Superfamily II helicase (NCVOG0076), A2L-like transcription factor (NCVOG0262), RNA polymerase α subunit (NCVOG0274), RNA polymerase β subunit (NCVOG0271), mRNA capping enzyme, A32-like packaging ATPase (NCVOG0249), small subunit of ribonucleotide reductase (NCVOG0276), Myristylated envelope protein (NCVOG0211), primase-helicase (NCVOG0023), and DNA polymerase (NCVOG0038) (See Additional File 2). The branch lengths and ELW values (shown as percentage points) are from a tree that was constructed from a concatenated alignment of 4 universal proteins (primase-helicase, DNA polymerase, packaging ATPase, and A2L-like transcription factor).

Mentions: In the best supported consensus tree topology, the recently discovered Marseillevirus clustered with irido- and ascoviruses (the latter were confidently placed inside the Iridoviridae), albeit with a low confidence; mimiviruses clustered with phycodnaviruses; and poxviruses grouped with asfarviruses (Figure 7). Of the 10 trees that contributed to the consensus tree, 5 displayed the same topology, at the level of major branches (viral families), as the consensus tree and 3 were compatible with the consensus topology (Approximately Unbiased (AU) test [34] p-value > 0.05). The trees of the DNA polymerase and primase-helicase showed significant differences (p < 0.05) from the consensus (see Additional File 2) according to the AU test. In the DNA polymerase tree, phycodnaviruses confidently grouped with the Irido-Marseillevirus branch, in contrast to the phycodna-mimi clade in the consensus tree. The primase-helicase tree was the "worst" in terms of conformity to the consensus, with the unusual but strongly supported Mimi-Irido-Marseille clade and moderately supported joining of asfarviruses to that branch (compare the trees in Figure 7 and Additional File 2). Given the propagation of mimiviruses and Marseillevirus in the host (Acanthamoeba) [19], the recent isolation of an asfarvirus from a dinoflagellate [35], and indications from metagenomics that iridoviruses might infect marine unicellular eukaryotes as well [21,23], horizontal exchange of these essential genes among viruses from different families cannot be ruled out. Further investigation of this intriguing possibility requires deeper genomic sampling of NCLDV and a comprehensive phylogenetic analysis (see also below).


Eukaryotic large nucleo-cytoplasmic DNA viruses: clusters of orthologous genes and reconstruction of viral genome evolution.

Yutin N, Wolf YI, Raoult D, Koonin EV - Virol. J. (2009)

The consensus phylogenetic tree of the NCLDV. The Expected Likelihood Weights (1,000 replications) are indicated for each ancestral node as percentage points. The topology of the tree was derived as the consensus of the tree topologies for the following 10 (nearly) universal NCVOGs: Superfamily II helicase (NCVOG0076), A2L-like transcription factor (NCVOG0262), RNA polymerase α subunit (NCVOG0274), RNA polymerase β subunit (NCVOG0271), mRNA capping enzyme, A32-like packaging ATPase (NCVOG0249), small subunit of ribonucleotide reductase (NCVOG0276), Myristylated envelope protein (NCVOG0211), primase-helicase (NCVOG0023), and DNA polymerase (NCVOG0038) (See Additional File 2). The branch lengths and ELW values (shown as percentage points) are from a tree that was constructed from a concatenated alignment of 4 universal proteins (primase-helicase, DNA polymerase, packaging ATPase, and A2L-like transcription factor).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2806869&req=5

Figure 7: The consensus phylogenetic tree of the NCLDV. The Expected Likelihood Weights (1,000 replications) are indicated for each ancestral node as percentage points. The topology of the tree was derived as the consensus of the tree topologies for the following 10 (nearly) universal NCVOGs: Superfamily II helicase (NCVOG0076), A2L-like transcription factor (NCVOG0262), RNA polymerase α subunit (NCVOG0274), RNA polymerase β subunit (NCVOG0271), mRNA capping enzyme, A32-like packaging ATPase (NCVOG0249), small subunit of ribonucleotide reductase (NCVOG0276), Myristylated envelope protein (NCVOG0211), primase-helicase (NCVOG0023), and DNA polymerase (NCVOG0038) (See Additional File 2). The branch lengths and ELW values (shown as percentage points) are from a tree that was constructed from a concatenated alignment of 4 universal proteins (primase-helicase, DNA polymerase, packaging ATPase, and A2L-like transcription factor).
Mentions: In the best supported consensus tree topology, the recently discovered Marseillevirus clustered with irido- and ascoviruses (the latter were confidently placed inside the Iridoviridae), albeit with a low confidence; mimiviruses clustered with phycodnaviruses; and poxviruses grouped with asfarviruses (Figure 7). Of the 10 trees that contributed to the consensus tree, 5 displayed the same topology, at the level of major branches (viral families), as the consensus tree and 3 were compatible with the consensus topology (Approximately Unbiased (AU) test [34] p-value > 0.05). The trees of the DNA polymerase and primase-helicase showed significant differences (p < 0.05) from the consensus (see Additional File 2) according to the AU test. In the DNA polymerase tree, phycodnaviruses confidently grouped with the Irido-Marseillevirus branch, in contrast to the phycodna-mimi clade in the consensus tree. The primase-helicase tree was the "worst" in terms of conformity to the consensus, with the unusual but strongly supported Mimi-Irido-Marseille clade and moderately supported joining of asfarviruses to that branch (compare the trees in Figure 7 and Additional File 2). Given the propagation of mimiviruses and Marseillevirus in the host (Acanthamoeba) [19], the recent isolation of an asfarvirus from a dinoflagellate [35], and indications from metagenomics that iridoviruses might infect marine unicellular eukaryotes as well [21,23], horizontal exchange of these essential genes among viruses from different families cannot be ruled out. Further investigation of this intriguing possibility requires deeper genomic sampling of NCLDV and a comprehensive phylogenetic analysis (see also below).

Bottom Line: A comprehensive comparison of the protein sequences encoded in the genomes of 45 NCLDV belonging to 6 families was performed in order to delineate cluster of orthologous viral genes.The NCVOGs were manually curated and annotated and can be used as a computational platform for functional annotation and evolutionary analysis of new NCLDV genomes.The NCVOGs are a flexible and expandable platform for genome analysis and functional annotation of newly characterized NCLDV.

View Article: PubMed Central - HTML - PubMed

Affiliation: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA. yutin@ncbi.nlm.nih.gov

ABSTRACT

Background: The Nucleo-Cytoplasmic Large DNA Viruses (NCLDV) comprise an apparently monophyletic class of viruses that infect a broad variety of eukaryotic hosts. Recent progress in isolation of new viruses and genome sequencing resulted in a substantial expansion of the NCLDV diversity, resulting in additional opportunities for comparative genomic analysis, and a demand for a comprehensive classification of viral genes.

Results: A comprehensive comparison of the protein sequences encoded in the genomes of 45 NCLDV belonging to 6 families was performed in order to delineate cluster of orthologous viral genes. Using previously developed computational methods for orthology identification, 1445 Nucleo-Cytoplasmic Virus Orthologous Groups (NCVOGs) were identified of which 177 are represented in more than one NCLDV family. The NCVOGs were manually curated and annotated and can be used as a computational platform for functional annotation and evolutionary analysis of new NCLDV genomes. A maximum-likelihood reconstruction of the NCLDV evolution yielded a set of 47 conserved genes that were probably present in the genome of the common ancestor of this class of eukaryotic viruses. This reconstructed ancestral gene set is robust to the parameters of the reconstruction procedure and so is likely to accurately reflect the gene core of the ancestral NCLDV, indicating that this virus encoded a complex machinery of replication, expression and morphogenesis that made it relatively independent from host cell functions.

Conclusions: The NCVOGs are a flexible and expandable platform for genome analysis and functional annotation of newly characterized NCLDV. Evolutionary reconstructions employing NCVOGs point to complex ancestral viruses.

Show MeSH
Related in: MedlinePlus