Limits...
A phylogenomic data-driven exploration of viral origins and evolution.

Nasir A, Caetano-Anollés G - Sci Adv (2015)

Bottom Line: Although numerous hypotheses have attempted to explain viral origins, none is backed by substantive data.Despite the extremely reduced nature of viral proteomes, we established an ancient origin of the "viral supergroup" and the existence of widespread episodes of horizontal transfer of genetic information.Phylogenomic analysis uncovered a universal tree of life and revealed that modern viruses reduced from multiple ancient cells that harbored segmented RNA genomes and coexisted with the ancestors of modern cells.

View Article: PubMed Central - PubMed

Affiliation: Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and Illinois Informatics Institute, University of Illinois, Urbana, IL 61801, USA.

ABSTRACT
The origin of viruses remains mysterious because of their diverse and patchy molecular and functional makeup. Although numerous hypotheses have attempted to explain viral origins, none is backed by substantive data. We take full advantage of the wealth of available protein structural and functional data to explore the evolution of the proteomic makeup of thousands of cells and viruses. Despite the extremely reduced nature of viral proteomes, we established an ancient origin of the "viral supergroup" and the existence of widespread episodes of horizontal transfer of genetic information. Viruses harboring different replicon types and infecting distantly related hosts shared many metabolic and informational protein structural domains of ancient origin that were also widespread in cellular proteomes. Phylogenomic analysis uncovered a universal tree of life and revealed that modern viruses reduced from multiple ancient cells that harbored segmented RNA genomes and coexisted with the ancestors of modern cells. The model for the origin and evolution of viruses and cells is backed by strong genomic and structural evidence and can be reconciled with existing models of viral evolution if one considers viruses to have originated from ancient cells and not from modern counterparts.

No MeSH data available.


Related in: MedlinePlus

Virus-host preferences and FSF distribution in viruses infecting different hosts.(A) The abundance of each viral replicon type that is capable of infecting Archaea, Bacteria, and Eukarya and major divisions in Eukarya. Virus-host information was retrieved from the National Center for Biotechnology Information Viral Genomes Project (119). Hosts were classified into Archaea, Bacteria, Protista (animal-like protists), Fungi, Plants (all plants, blue-green algae, and diatoms), Invertebrates and Plants (IP), and Metazoa (vertebrates, invertebrates, and humans). Host information was available for 3440 of the 3660 viruses that were sampled in this study. Two additional ssDNA archaeoviruses were added from the literature (129, 130). Numbers on bars indicate the total virus count in each host group. (B) Venn diagram shows the distribution of 715 (of 716) FSFs that were detected in archaeoviruses, bacterioviruses, and eukaryoviruses. Host information on the Circovirus-like genome RW_B virus encoding the “Satellite viruses” FSF (b.121.7) was not available. (C) Mean f values for FSFs corresponding to each of the seven Venn groups defined in (B) in archaeal, bacterial, and eukaryal proteomes. Values were averaged for all FSFs in each of the seven Venn groups. Text above bars indicates how many different viral subgroups encoded those FSFs.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4643759&req=5

Figure 3: Virus-host preferences and FSF distribution in viruses infecting different hosts.(A) The abundance of each viral replicon type that is capable of infecting Archaea, Bacteria, and Eukarya and major divisions in Eukarya. Virus-host information was retrieved from the National Center for Biotechnology Information Viral Genomes Project (119). Hosts were classified into Archaea, Bacteria, Protista (animal-like protists), Fungi, Plants (all plants, blue-green algae, and diatoms), Invertebrates and Plants (IP), and Metazoa (vertebrates, invertebrates, and humans). Host information was available for 3440 of the 3660 viruses that were sampled in this study. Two additional ssDNA archaeoviruses were added from the literature (129, 130). Numbers on bars indicate the total virus count in each host group. (B) Venn diagram shows the distribution of 715 (of 716) FSFs that were detected in archaeoviruses, bacterioviruses, and eukaryoviruses. Host information on the Circovirus-like genome RW_B virus encoding the “Satellite viruses” FSF (b.121.7) was not available. (C) Mean f values for FSFs corresponding to each of the seven Venn groups defined in (B) in archaeal, bacterial, and eukaryal proteomes. Values were averaged for all FSFs in each of the seven Venn groups. Text above bars indicates how many different viral subgroups encoded those FSFs.

Mentions: We calculated the “virus count” for each replicon type in major host groups to determine the virus-host relationships of viruses in our data set (Fig. 3A). The exercise revealed that most RNA viral subgroups were exclusive to eukaryotes (for example, minus-ssRNA and retrotranscribing viruses) (Fig. 3A). In turn, a large number of DNA viruses (mostly Caudovirales) infected prokaryotic hosts. The bias in the distribution of replicon types in superkingdoms (that is, DNA viruses in prokaryotes and RNA viruses in eukaryotes) leads to an interesting possibility about the early origin of RNA viruses and their loss in prokaryotes [see Discussion (43)]. Virus-host relationships have been described in detail previously (43–45). Here, the more relevant question was asked: Do viruses infecting distantly related hosts share common protein folds? To answer, we generated a Venn diagram describing viral FSF repertoires. FSFs that were shared by archaeoviruses (a), bacterioviruses (b), and eukaryoviruses (e) were pooled into the abe Venn group; those shared by viruses infecting two different superkingdoms were pooled into the ab, ae, or be group; and those unique to viruses infecting a single superkingdom were pooled into the a, b, and e groups (Venn group nomenclature avoids ambiguity with that of Fig. 1A) (Fig. 3B). We stress that FSFs in the abe group do not mean that these were present in a virus capable of infecting Archaea, Bacteria, and Eukarya. To date, no virus is known to infect organisms in more than one superkingdom. Instead, it simply refers to the count of FSFs that were shared between archaeoviruses, bacterioviruses, and eukaryoviruses.


A phylogenomic data-driven exploration of viral origins and evolution.

Nasir A, Caetano-Anollés G - Sci Adv (2015)

Virus-host preferences and FSF distribution in viruses infecting different hosts.(A) The abundance of each viral replicon type that is capable of infecting Archaea, Bacteria, and Eukarya and major divisions in Eukarya. Virus-host information was retrieved from the National Center for Biotechnology Information Viral Genomes Project (119). Hosts were classified into Archaea, Bacteria, Protista (animal-like protists), Fungi, Plants (all plants, blue-green algae, and diatoms), Invertebrates and Plants (IP), and Metazoa (vertebrates, invertebrates, and humans). Host information was available for 3440 of the 3660 viruses that were sampled in this study. Two additional ssDNA archaeoviruses were added from the literature (129, 130). Numbers on bars indicate the total virus count in each host group. (B) Venn diagram shows the distribution of 715 (of 716) FSFs that were detected in archaeoviruses, bacterioviruses, and eukaryoviruses. Host information on the Circovirus-like genome RW_B virus encoding the “Satellite viruses” FSF (b.121.7) was not available. (C) Mean f values for FSFs corresponding to each of the seven Venn groups defined in (B) in archaeal, bacterial, and eukaryal proteomes. Values were averaged for all FSFs in each of the seven Venn groups. Text above bars indicates how many different viral subgroups encoded those FSFs.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4643759&req=5

Figure 3: Virus-host preferences and FSF distribution in viruses infecting different hosts.(A) The abundance of each viral replicon type that is capable of infecting Archaea, Bacteria, and Eukarya and major divisions in Eukarya. Virus-host information was retrieved from the National Center for Biotechnology Information Viral Genomes Project (119). Hosts were classified into Archaea, Bacteria, Protista (animal-like protists), Fungi, Plants (all plants, blue-green algae, and diatoms), Invertebrates and Plants (IP), and Metazoa (vertebrates, invertebrates, and humans). Host information was available for 3440 of the 3660 viruses that were sampled in this study. Two additional ssDNA archaeoviruses were added from the literature (129, 130). Numbers on bars indicate the total virus count in each host group. (B) Venn diagram shows the distribution of 715 (of 716) FSFs that were detected in archaeoviruses, bacterioviruses, and eukaryoviruses. Host information on the Circovirus-like genome RW_B virus encoding the “Satellite viruses” FSF (b.121.7) was not available. (C) Mean f values for FSFs corresponding to each of the seven Venn groups defined in (B) in archaeal, bacterial, and eukaryal proteomes. Values were averaged for all FSFs in each of the seven Venn groups. Text above bars indicates how many different viral subgroups encoded those FSFs.
Mentions: We calculated the “virus count” for each replicon type in major host groups to determine the virus-host relationships of viruses in our data set (Fig. 3A). The exercise revealed that most RNA viral subgroups were exclusive to eukaryotes (for example, minus-ssRNA and retrotranscribing viruses) (Fig. 3A). In turn, a large number of DNA viruses (mostly Caudovirales) infected prokaryotic hosts. The bias in the distribution of replicon types in superkingdoms (that is, DNA viruses in prokaryotes and RNA viruses in eukaryotes) leads to an interesting possibility about the early origin of RNA viruses and their loss in prokaryotes [see Discussion (43)]. Virus-host relationships have been described in detail previously (43–45). Here, the more relevant question was asked: Do viruses infecting distantly related hosts share common protein folds? To answer, we generated a Venn diagram describing viral FSF repertoires. FSFs that were shared by archaeoviruses (a), bacterioviruses (b), and eukaryoviruses (e) were pooled into the abe Venn group; those shared by viruses infecting two different superkingdoms were pooled into the ab, ae, or be group; and those unique to viruses infecting a single superkingdom were pooled into the a, b, and e groups (Venn group nomenclature avoids ambiguity with that of Fig. 1A) (Fig. 3B). We stress that FSFs in the abe group do not mean that these were present in a virus capable of infecting Archaea, Bacteria, and Eukarya. To date, no virus is known to infect organisms in more than one superkingdom. Instead, it simply refers to the count of FSFs that were shared between archaeoviruses, bacterioviruses, and eukaryoviruses.

Bottom Line: Although numerous hypotheses have attempted to explain viral origins, none is backed by substantive data.Despite the extremely reduced nature of viral proteomes, we established an ancient origin of the "viral supergroup" and the existence of widespread episodes of horizontal transfer of genetic information.Phylogenomic analysis uncovered a universal tree of life and revealed that modern viruses reduced from multiple ancient cells that harbored segmented RNA genomes and coexisted with the ancestors of modern cells.

View Article: PubMed Central - PubMed

Affiliation: Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and Illinois Informatics Institute, University of Illinois, Urbana, IL 61801, USA.

ABSTRACT
The origin of viruses remains mysterious because of their diverse and patchy molecular and functional makeup. Although numerous hypotheses have attempted to explain viral origins, none is backed by substantive data. We take full advantage of the wealth of available protein structural and functional data to explore the evolution of the proteomic makeup of thousands of cells and viruses. Despite the extremely reduced nature of viral proteomes, we established an ancient origin of the "viral supergroup" and the existence of widespread episodes of horizontal transfer of genetic information. Viruses harboring different replicon types and infecting distantly related hosts shared many metabolic and informational protein structural domains of ancient origin that were also widespread in cellular proteomes. Phylogenomic analysis uncovered a universal tree of life and revealed that modern viruses reduced from multiple ancient cells that harbored segmented RNA genomes and coexisted with the ancestors of modern cells. The model for the origin and evolution of viruses and cells is backed by strong genomic and structural evidence and can be reconciled with existing models of viral evolution if one considers viruses to have originated from ancient cells and not from modern counterparts.

No MeSH data available.


Related in: MedlinePlus