Limits...
The Gypsy Database (GyDB) of mobile genetic elements.

Lloréns C, Futami R, Bezemer D, Moya A - Nucleic Acids Res. (2007)

Bottom Line: In this first version, we contemplate eukaryotic Ty3/Gypsy and Retroviridae long terminal repeats (LTR) retroelements.Phylogenetic analyses based on the gag-pro-pol internal region commonly presented by these two groups strongly support a certain number of previously described Ty3/Gypsy lineages originally reported from reverse-transcriptase (RT) analyses.Vertebrate retroviruses (Retroviridae) are also constituted in several monophyletic groups consistent with genera proposed by the ICTV nomenclature, as well as with the current tendency to classify both endogenous and exogenous retroviruses by three major classes (I, II and III).

View Article: PubMed Central - PubMed

Affiliation: Biotech Vana, Valencia, Institut Cavanilles de Biodiversitat i Biología Evolutiva Universitat de València, Spain.

ABSTRACT
In this article, we introduce the Gypsy Database (GyDB) of mobile genetic elements, an in-progress database devoted to the non-redundant analysis and evolutionary-based classification of mobile genetic elements. In this first version, we contemplate eukaryotic Ty3/Gypsy and Retroviridae long terminal repeats (LTR) retroelements. Phylogenetic analyses based on the gag-pro-pol internal region commonly presented by these two groups strongly support a certain number of previously described Ty3/Gypsy lineages originally reported from reverse-transcriptase (RT) analyses. Vertebrate retroviruses (Retroviridae) are also constituted in several monophyletic groups consistent with genera proposed by the ICTV nomenclature, as well as with the current tendency to classify both endogenous and exogenous retroviruses by three major classes (I, II and III). Our inference indicates that all protein domains codified by the gag-pro-pol internal region of these two groups agree in a collective presentation of a particular evolutionary history, which may be used as a main criterion to differentiate their molecular diversity in a comprehensive collection of phylogenies and non-redundant molecular profiles useful in the identification of new Ty3/Gypsy and Retroviridae species. The GyDB project is available at http://gydb.uv.es.

Show MeSH
MRC tree inferred for Ty3/Gypsy and Retroviridae LTR retroelements using the parsimony method and based on a concatenated gag-pro-pol multiple alignment. Host organisms and monophyletic clusters are detailed at left. MRC trees usually consist of all groups that occur more than 50% of the time, we take consensus values higher than 55 as an equivalent-bootstrapping reference.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2238898&req=5

Figure 2: MRC tree inferred for Ty3/Gypsy and Retroviridae LTR retroelements using the parsimony method and based on a concatenated gag-pro-pol multiple alignment. Host organisms and monophyletic clusters are detailed at left. MRC trees usually consist of all groups that occur more than 50% of the time, we take consensus values higher than 55 as an equivalent-bootstrapping reference.

Mentions: The first version of the GyDB focuses on the exhaustive analysis of 120 non-redundant Ty3/Gypsy and Retroviridae full-length genomes collected at the National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov/). The most conserved part (core) of each protein domain was aligned using CLUSTALX (28) and refined with GENEDOC editor (http://www.psc.edu/biomed/genedoc). Although the Retroviridae display identical gag-pro-pol-env structure as Ty3/Gypsy retroviruses (29) not all Ty3/Gypsy LTR retroelements are retroviruses, and it is well supported that the different lineages of retroviruses described in invertebrates probably acquired their env genes by independent gene recruitment events (see Ref. (29) and references therein). Consequently, the most valuable relationships between Ty3/Gypsy and Retroviridae LTR retroelements should be sought in the internal region that codifies for the gag and pol polyproteins. The criteria for LTR retroelement classification at the GyDB are thus based on the clusters reported by a majority-rule consensus (MRC) tree inferred based on a concatenated gag-pro-pol multiple alignment containing the most conserved part of the CA, NC, PR, RT, RNAseH and INT domains. Nevertheless, we have also inferred and provide online, independent phylogenies based on the gag polyprotein, the pol polyprotein and all pol protein domains, and the env polyprotein. The gag-pro-pol alignment has therefore two components, the gag polyprotein and the pol polyprotein. Regarding the gag polyprotein we consider only the CA–NC region because MA is absent in many Ty3/Gypsy sequences and in others cannot be exhaustively aligned due to extreme divergence. Concerning the pol polyprotein, we consider the PR-RT-RNAseH-INT region from the catalytic DTG PR motif (30) to the GPY/F INT module (31). The PR domain is taken as another pol component as it has a low but similar phylogenetic signal than other pol protein domains (see PR MRC tree in the ‘Section Phylogenies’, at GyDB). As shown in Figure 2, gag-pro-pol tree agrees and improves all clades and genera heretofore inferred based on the RT, RNAseH or INT pol-like domains (22–24,26,31–45). This indicates that despite the different rates of evolution (not considered by parsimony method) all protein domain encoded by the gag-pro-pol internal region (except MA) have a similar phylogenetic signal that may be used as a main criterion to phylogenetically classifying and profiling the currently known Ty3/Gypsy and Retroviridae diversity. In an attempt to identify the most satisfactory method of phylogenetic inference, we tested the distance-based neighbour-joining (NJ) method (46) and the minimum-change-based Parsimony method (47,48) using Phylip 3.6 (http://evolution.gs.washington.edu/phylip.html) to infer MRC trees (49). The two methods reported identical clusters of operative taxonomical units (OTUs) (see Llorens and Moya, the Three Kings Hypothesis, manuscript in preparation). This has allowed us to taxonomically and realistically define the monophyletic clusters of protein families, independently of which method would be used. However, the parsimony method was revealed to be much more consistent with comparative analyses than NJ-method when inferring phylogenies based on non-conserved protein domains such as the gag polyprotein and the protease domain. Although these two proteins are extremely divergent (less than 20% of overall identity), all sequences belonging to a particular lineage have an amino acid architecture in common that is similar but divergent from that displayed in other lineages. The point is that when inferring phylogenies involving these two proteins, parsimony method always anticipated in our analyses a MRC tree more consistent with comparative analyses than NJ, and also supported the overall clustering with better statistical values. We have thus chosen Parsimony MRC trees as principal phylogenetic reference, at GyDB. Phylogeny websites are presented through an HTML file where clicking on the name of any retroelement, will access a link to a descriptive file that in turn links to the NCBI Genbank accession of the requested element, as well as a short discussion, taxonomy information, genomic structure and a bibliography concerning the element described. If the selected element has no file, the link takes the user directly to the sequence's Genbank accession at the NCBI.Figure 2.


The Gypsy Database (GyDB) of mobile genetic elements.

Lloréns C, Futami R, Bezemer D, Moya A - Nucleic Acids Res. (2007)

MRC tree inferred for Ty3/Gypsy and Retroviridae LTR retroelements using the parsimony method and based on a concatenated gag-pro-pol multiple alignment. Host organisms and monophyletic clusters are detailed at left. MRC trees usually consist of all groups that occur more than 50% of the time, we take consensus values higher than 55 as an equivalent-bootstrapping reference.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2238898&req=5

Figure 2: MRC tree inferred for Ty3/Gypsy and Retroviridae LTR retroelements using the parsimony method and based on a concatenated gag-pro-pol multiple alignment. Host organisms and monophyletic clusters are detailed at left. MRC trees usually consist of all groups that occur more than 50% of the time, we take consensus values higher than 55 as an equivalent-bootstrapping reference.
Mentions: The first version of the GyDB focuses on the exhaustive analysis of 120 non-redundant Ty3/Gypsy and Retroviridae full-length genomes collected at the National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov/). The most conserved part (core) of each protein domain was aligned using CLUSTALX (28) and refined with GENEDOC editor (http://www.psc.edu/biomed/genedoc). Although the Retroviridae display identical gag-pro-pol-env structure as Ty3/Gypsy retroviruses (29) not all Ty3/Gypsy LTR retroelements are retroviruses, and it is well supported that the different lineages of retroviruses described in invertebrates probably acquired their env genes by independent gene recruitment events (see Ref. (29) and references therein). Consequently, the most valuable relationships between Ty3/Gypsy and Retroviridae LTR retroelements should be sought in the internal region that codifies for the gag and pol polyproteins. The criteria for LTR retroelement classification at the GyDB are thus based on the clusters reported by a majority-rule consensus (MRC) tree inferred based on a concatenated gag-pro-pol multiple alignment containing the most conserved part of the CA, NC, PR, RT, RNAseH and INT domains. Nevertheless, we have also inferred and provide online, independent phylogenies based on the gag polyprotein, the pol polyprotein and all pol protein domains, and the env polyprotein. The gag-pro-pol alignment has therefore two components, the gag polyprotein and the pol polyprotein. Regarding the gag polyprotein we consider only the CA–NC region because MA is absent in many Ty3/Gypsy sequences and in others cannot be exhaustively aligned due to extreme divergence. Concerning the pol polyprotein, we consider the PR-RT-RNAseH-INT region from the catalytic DTG PR motif (30) to the GPY/F INT module (31). The PR domain is taken as another pol component as it has a low but similar phylogenetic signal than other pol protein domains (see PR MRC tree in the ‘Section Phylogenies’, at GyDB). As shown in Figure 2, gag-pro-pol tree agrees and improves all clades and genera heretofore inferred based on the RT, RNAseH or INT pol-like domains (22–24,26,31–45). This indicates that despite the different rates of evolution (not considered by parsimony method) all protein domain encoded by the gag-pro-pol internal region (except MA) have a similar phylogenetic signal that may be used as a main criterion to phylogenetically classifying and profiling the currently known Ty3/Gypsy and Retroviridae diversity. In an attempt to identify the most satisfactory method of phylogenetic inference, we tested the distance-based neighbour-joining (NJ) method (46) and the minimum-change-based Parsimony method (47,48) using Phylip 3.6 (http://evolution.gs.washington.edu/phylip.html) to infer MRC trees (49). The two methods reported identical clusters of operative taxonomical units (OTUs) (see Llorens and Moya, the Three Kings Hypothesis, manuscript in preparation). This has allowed us to taxonomically and realistically define the monophyletic clusters of protein families, independently of which method would be used. However, the parsimony method was revealed to be much more consistent with comparative analyses than NJ-method when inferring phylogenies based on non-conserved protein domains such as the gag polyprotein and the protease domain. Although these two proteins are extremely divergent (less than 20% of overall identity), all sequences belonging to a particular lineage have an amino acid architecture in common that is similar but divergent from that displayed in other lineages. The point is that when inferring phylogenies involving these two proteins, parsimony method always anticipated in our analyses a MRC tree more consistent with comparative analyses than NJ, and also supported the overall clustering with better statistical values. We have thus chosen Parsimony MRC trees as principal phylogenetic reference, at GyDB. Phylogeny websites are presented through an HTML file where clicking on the name of any retroelement, will access a link to a descriptive file that in turn links to the NCBI Genbank accession of the requested element, as well as a short discussion, taxonomy information, genomic structure and a bibliography concerning the element described. If the selected element has no file, the link takes the user directly to the sequence's Genbank accession at the NCBI.Figure 2.

Bottom Line: In this first version, we contemplate eukaryotic Ty3/Gypsy and Retroviridae long terminal repeats (LTR) retroelements.Phylogenetic analyses based on the gag-pro-pol internal region commonly presented by these two groups strongly support a certain number of previously described Ty3/Gypsy lineages originally reported from reverse-transcriptase (RT) analyses.Vertebrate retroviruses (Retroviridae) are also constituted in several monophyletic groups consistent with genera proposed by the ICTV nomenclature, as well as with the current tendency to classify both endogenous and exogenous retroviruses by three major classes (I, II and III).

View Article: PubMed Central - PubMed

Affiliation: Biotech Vana, Valencia, Institut Cavanilles de Biodiversitat i Biología Evolutiva Universitat de València, Spain.

ABSTRACT
In this article, we introduce the Gypsy Database (GyDB) of mobile genetic elements, an in-progress database devoted to the non-redundant analysis and evolutionary-based classification of mobile genetic elements. In this first version, we contemplate eukaryotic Ty3/Gypsy and Retroviridae long terminal repeats (LTR) retroelements. Phylogenetic analyses based on the gag-pro-pol internal region commonly presented by these two groups strongly support a certain number of previously described Ty3/Gypsy lineages originally reported from reverse-transcriptase (RT) analyses. Vertebrate retroviruses (Retroviridae) are also constituted in several monophyletic groups consistent with genera proposed by the ICTV nomenclature, as well as with the current tendency to classify both endogenous and exogenous retroviruses by three major classes (I, II and III). Our inference indicates that all protein domains codified by the gag-pro-pol internal region of these two groups agree in a collective presentation of a particular evolutionary history, which may be used as a main criterion to differentiate their molecular diversity in a comprehensive collection of phylogenies and non-redundant molecular profiles useful in the identification of new Ty3/Gypsy and Retroviridae species. The GyDB project is available at http://gydb.uv.es.

Show MeSH