Limits...
A phylogenomic data-driven exploration of viral origins and evolution.

Nasir A, Caetano-Anollés G - Sci Adv (2015)

Bottom Line: Although numerous hypotheses have attempted to explain viral origins, none is backed by substantive data.Despite the extremely reduced nature of viral proteomes, we established an ancient origin of the "viral supergroup" and the existence of widespread episodes of horizontal transfer of genetic information.Phylogenomic analysis uncovered a universal tree of life and revealed that modern viruses reduced from multiple ancient cells that harbored segmented RNA genomes and coexisted with the ancestors of modern cells.

View Article: PubMed Central - PubMed

Affiliation: Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and Illinois Informatics Institute, University of Illinois, Urbana, IL 61801, USA.

ABSTRACT
The origin of viruses remains mysterious because of their diverse and patchy molecular and functional makeup. Although numerous hypotheses have attempted to explain viral origins, none is backed by substantive data. We take full advantage of the wealth of available protein structural and functional data to explore the evolution of the proteomic makeup of thousands of cells and viruses. Despite the extremely reduced nature of viral proteomes, we established an ancient origin of the "viral supergroup" and the existence of widespread episodes of horizontal transfer of genetic information. Viruses harboring different replicon types and infecting distantly related hosts shared many metabolic and informational protein structural domains of ancient origin that were also widespread in cellular proteomes. Phylogenomic analysis uncovered a universal tree of life and revealed that modern viruses reduced from multiple ancient cells that harbored segmented RNA genomes and coexisted with the ancestors of modern cells. The model for the origin and evolution of viruses and cells is backed by strong genomic and structural evidence and can be reconciled with existing models of viral evolution if one considers viruses to have originated from ancient cells and not from modern counterparts.

No MeSH data available.


Related in: MedlinePlus

Phylogenomic analysis of FSF domains.(A) ToD describe the evolution of 1995 FSF domains (taxa) in 5080 proteomes (characters) (tree length = 1,882,554; retention index = 0.74; g1 = −0.18). The bar on top of ToD is a simple representation of how FSFs appeared in its branches, which correlates with their age (nd). FSFs were labeled blue for cell-only and red for those either shared with or unique to viruses. The boxplots identify the most ancient and derived Venn groups. Two major phases in the evolution of viruses are indicated in different background colors. Patterned area highlights the appearances of AV, BV, and EV soon after A, B, and E, respectively. FSFs are identified by SCOP css. (B) Viral FSFs plotted against their spread in viral proteomes (f value) and evolutionary time (nd). FSFs identified by SCOP css. (C) Distribution of ABEV FSFs in each viral subgroup along evolutionary time (nd). Numbers in parentheses indicate the total number of ABEV FSFs in each viral subgroup. White circles indicate group medians. Density trace is plotted symmetrically around the boxplots.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4643759&req=5

Figure 5: Phylogenomic analysis of FSF domains.(A) ToD describe the evolution of 1995 FSF domains (taxa) in 5080 proteomes (characters) (tree length = 1,882,554; retention index = 0.74; g1 = −0.18). The bar on top of ToD is a simple representation of how FSFs appeared in its branches, which correlates with their age (nd). FSFs were labeled blue for cell-only and red for those either shared with or unique to viruses. The boxplots identify the most ancient and derived Venn groups. Two major phases in the evolution of viruses are indicated in different background colors. Patterned area highlights the appearances of AV, BV, and EV soon after A, B, and E, respectively. FSFs are identified by SCOP css. (B) Viral FSFs plotted against their spread in viral proteomes (f value) and evolutionary time (nd). FSFs identified by SCOP css. (C) Distribution of ABEV FSFs in each viral subgroup along evolutionary time (nd). Numbers in parentheses indicate the total number of ABEV FSFs in each viral subgroup. White circles indicate group medians. Density trace is plotted symmetrically around the boxplots.

Mentions: The reconstruction of phylogenomic trees of domains (ToD), which describe the evolution of the 1995 FSF domains (taxa) that were surveyed in the 5080 sampled proteomes (characters) (see Materials and Methods for the tree reconstruction protocol), showed that most viral FSFs originated very early in evolution (see the legend bar on top of ToD in Fig. 5A). Because of its highly unbalanced nature, ToD enabled the calculation of a “proxy” for the relative age of each FSF domain, which was defined as the node distance (nd) value. This value was derived simply by counting the number of nodes from a terminal taxon to the root node of the tree and by expressing the phylogenetic distance on a relative scale from 0 (most ancient) to 1 (most recent) [methodology discussed elsewhere (18)]. We have previously shown that nd is a reliable proxy for the evolutionary age of FSFs and describes a clock-like behavior of FSF evolution that is remarkably consistent with geological records (56). To uncover likely evolutionary scenarios, we plotted FSFs in each of the 15 Venn groups in Fig. 1A against their FSF ages (that is, nd values) (boxplots in Fig. 5A).


A phylogenomic data-driven exploration of viral origins and evolution.

Nasir A, Caetano-Anollés G - Sci Adv (2015)

Phylogenomic analysis of FSF domains.(A) ToD describe the evolution of 1995 FSF domains (taxa) in 5080 proteomes (characters) (tree length = 1,882,554; retention index = 0.74; g1 = −0.18). The bar on top of ToD is a simple representation of how FSFs appeared in its branches, which correlates with their age (nd). FSFs were labeled blue for cell-only and red for those either shared with or unique to viruses. The boxplots identify the most ancient and derived Venn groups. Two major phases in the evolution of viruses are indicated in different background colors. Patterned area highlights the appearances of AV, BV, and EV soon after A, B, and E, respectively. FSFs are identified by SCOP css. (B) Viral FSFs plotted against their spread in viral proteomes (f value) and evolutionary time (nd). FSFs identified by SCOP css. (C) Distribution of ABEV FSFs in each viral subgroup along evolutionary time (nd). Numbers in parentheses indicate the total number of ABEV FSFs in each viral subgroup. White circles indicate group medians. Density trace is plotted symmetrically around the boxplots.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4643759&req=5

Figure 5: Phylogenomic analysis of FSF domains.(A) ToD describe the evolution of 1995 FSF domains (taxa) in 5080 proteomes (characters) (tree length = 1,882,554; retention index = 0.74; g1 = −0.18). The bar on top of ToD is a simple representation of how FSFs appeared in its branches, which correlates with their age (nd). FSFs were labeled blue for cell-only and red for those either shared with or unique to viruses. The boxplots identify the most ancient and derived Venn groups. Two major phases in the evolution of viruses are indicated in different background colors. Patterned area highlights the appearances of AV, BV, and EV soon after A, B, and E, respectively. FSFs are identified by SCOP css. (B) Viral FSFs plotted against their spread in viral proteomes (f value) and evolutionary time (nd). FSFs identified by SCOP css. (C) Distribution of ABEV FSFs in each viral subgroup along evolutionary time (nd). Numbers in parentheses indicate the total number of ABEV FSFs in each viral subgroup. White circles indicate group medians. Density trace is plotted symmetrically around the boxplots.
Mentions: The reconstruction of phylogenomic trees of domains (ToD), which describe the evolution of the 1995 FSF domains (taxa) that were surveyed in the 5080 sampled proteomes (characters) (see Materials and Methods for the tree reconstruction protocol), showed that most viral FSFs originated very early in evolution (see the legend bar on top of ToD in Fig. 5A). Because of its highly unbalanced nature, ToD enabled the calculation of a “proxy” for the relative age of each FSF domain, which was defined as the node distance (nd) value. This value was derived simply by counting the number of nodes from a terminal taxon to the root node of the tree and by expressing the phylogenetic distance on a relative scale from 0 (most ancient) to 1 (most recent) [methodology discussed elsewhere (18)]. We have previously shown that nd is a reliable proxy for the evolutionary age of FSFs and describes a clock-like behavior of FSF evolution that is remarkably consistent with geological records (56). To uncover likely evolutionary scenarios, we plotted FSFs in each of the 15 Venn groups in Fig. 1A against their FSF ages (that is, nd values) (boxplots in Fig. 5A).

Bottom Line: Although numerous hypotheses have attempted to explain viral origins, none is backed by substantive data.Despite the extremely reduced nature of viral proteomes, we established an ancient origin of the "viral supergroup" and the existence of widespread episodes of horizontal transfer of genetic information.Phylogenomic analysis uncovered a universal tree of life and revealed that modern viruses reduced from multiple ancient cells that harbored segmented RNA genomes and coexisted with the ancestors of modern cells.

View Article: PubMed Central - PubMed

Affiliation: Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and Illinois Informatics Institute, University of Illinois, Urbana, IL 61801, USA.

ABSTRACT
The origin of viruses remains mysterious because of their diverse and patchy molecular and functional makeup. Although numerous hypotheses have attempted to explain viral origins, none is backed by substantive data. We take full advantage of the wealth of available protein structural and functional data to explore the evolution of the proteomic makeup of thousands of cells and viruses. Despite the extremely reduced nature of viral proteomes, we established an ancient origin of the "viral supergroup" and the existence of widespread episodes of horizontal transfer of genetic information. Viruses harboring different replicon types and infecting distantly related hosts shared many metabolic and informational protein structural domains of ancient origin that were also widespread in cellular proteomes. Phylogenomic analysis uncovered a universal tree of life and revealed that modern viruses reduced from multiple ancient cells that harbored segmented RNA genomes and coexisted with the ancestors of modern cells. The model for the origin and evolution of viruses and cells is backed by strong genomic and structural evidence and can be reconciled with existing models of viral evolution if one considers viruses to have originated from ancient cells and not from modern counterparts.

No MeSH data available.


Related in: MedlinePlus