Limits...
Modular organization and reticulate evolution of the ORF1 of Jockey superfamily transposable elements.

Metcalfe CJ, Casane D - Mob DNA (2014)

Bottom Line: ORF1 type variations involving the PHD domain were found in many subgroups of the L2 and CR1 lineages.A Jockey lineage-like ORF1 with a PHD domain was found in both lineages.In conclusion, while the structure of the ORF2 appears to be highly constrained and its evolution tree-like, the structure of the ORF1 within the CR1 and L2 lineages is much more variable and its evolution reticulate.

View Article: PubMed Central - HTML - PubMed

Affiliation: Universidade de São Paulo, Instituto de Biociências, Rua do Matão 277, Cidade Universitária, São Paulo 05508-090 SP, Brazil.

ABSTRACT

Background: Long interspersed nuclear elements (LINES) are the most common transposable element (TE) in almost all metazoan genomes examined. In most LINE superfamilies there are two open reading frames (ORFs), and both are required for transposition. The ORF2 is well characterized, while the structure and function of the ORF1 is less well understood. ORF1s have been classified into five types based on structural organization and the domains identified. Here we perform a large scale analysis of ORF1 domains of 448 elements from the Jockey superfamily using multiple alignments and Hidden Markov Model (HMM)-HMM comparisons.

Results: Three major lineages, Chicken repeat 1 (CR1), LINE2 (L2) and Jockey, were identified. All Jockey lineage elements have the same type of ORF1. In contrast, in the L2 and CR1 lineage elements, all five ORF1 types are found, with no one type of ORF1 predominating. A plant homeodomain (PHD) is much more prevalent than previously suspected. ORF1 type variations involving the PHD domain were found in many subgroups of the L2 and CR1 lineages. A Jockey lineage-like ORF1 with a PHD domain was found in both lineages. A phylogenetic analysis of this ORF1 suggests that it has been horizontally transferred. Likewise, an esterase containing ORF1 type was only found in two exclusively vertebrate L2 and CR1 groups, indicating that it may have been acquired in a vertebrate common ancestor and then transferred between the lineages.

Conclusions: The ORF1 of the CR1 and L2 lineages is very structurally diverse. The presence of a PHD domain in many ORF1s of the L2 and CR1 lineages is suggestive of domain shuffling. There is also evidence of possible horizontal transfer of entire ORF1s between lineages. In conclusion, while the structure of the ORF2 appears to be highly constrained and its evolution tree-like, the structure of the ORF1 within the CR1 and L2 lineages is much more variable and its evolution reticulate.

No MeSH data available.


Related in: MedlinePlus

ORF1 types identified in the Jockey, CR1 and L2 lineages. Subtypes (A, B, C) are used to show the diversity of ORF1 structures within types identified in this paper. Subtype titles within a circle denote those previously described by Khazina and Weichenrieder [11] and Kapitonov et al. [16]. Lineages and subgroups were identified by ORF1 structure and phylogenetic structuring based on the apurinic endonuclease (APE) and reverse transcriptase (RT) domains (see Figures 3, 4). Clades within lineages were identified by the RTclass1 tool [9]. The phylum and species are taken from the Repbase sequence title [17]. The ORF1 structure schematic shows coding domains 5’ to the endonuclease identified in this publication and are drawn to scale. Domains not always present are shown with a dashed outline. Red: CCHC, gag-like Cys2HisCys zinc-knuckle; green: CTD, C terminal domain; yellow: coiled-coil domain; purple: esterase; pink: PHD, plant homeodomain; blue: RRM, RNA recognition motif; lilac: zf/lz, zinc finger/leucine zipper. The hatched CC, RRM + CTD domains indicate transposase 22, the RCSB Protein Data Bank entry 2yko and Pfam entry PF02994. A key to all the domains is shown in Figure 6.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4120745&req=5

Figure 2: ORF1 types identified in the Jockey, CR1 and L2 lineages. Subtypes (A, B, C) are used to show the diversity of ORF1 structures within types identified in this paper. Subtype titles within a circle denote those previously described by Khazina and Weichenrieder [11] and Kapitonov et al. [16]. Lineages and subgroups were identified by ORF1 structure and phylogenetic structuring based on the apurinic endonuclease (APE) and reverse transcriptase (RT) domains (see Figures 3, 4). Clades within lineages were identified by the RTclass1 tool [9]. The phylum and species are taken from the Repbase sequence title [17]. The ORF1 structure schematic shows coding domains 5’ to the endonuclease identified in this publication and are drawn to scale. Domains not always present are shown with a dashed outline. Red: CCHC, gag-like Cys2HisCys zinc-knuckle; green: CTD, C terminal domain; yellow: coiled-coil domain; purple: esterase; pink: PHD, plant homeodomain; blue: RRM, RNA recognition motif; lilac: zf/lz, zinc finger/leucine zipper. The hatched CC, RRM + CTD domains indicate transposase 22, the RCSB Protein Data Bank entry 2yko and Pfam entry PF02994. A key to all the domains is shown in Figure 6.

Mentions: LINE superfamilies. Relationships between LINE superfamilies/groups and assignment of clades to superfamilies/groups based on reverse transcriptase (RT) phylogeny [7,9,10]. LINE clades were first assigned to five groups (R2, L1, RTE, I and Jockey) by Eickbush and Malik [8]. Groups are called superfamilies in the TE classification paper by Wicker et al. [7]. The ORF1 is not present in some RTE clades (shown with a dashed outline). In this paper, all Jockey superfamily/group full-length sequences from the Repbase database were assigned to three lineages based on an APE-RT phylogeny (see Figure 3). Subgroups were identified within the three lineages and ORF1 structures (see Figure 2) mapped onto the phylogeny (see Figures 4, 5 and 6).


Modular organization and reticulate evolution of the ORF1 of Jockey superfamily transposable elements.

Metcalfe CJ, Casane D - Mob DNA (2014)

ORF1 types identified in the Jockey, CR1 and L2 lineages. Subtypes (A, B, C) are used to show the diversity of ORF1 structures within types identified in this paper. Subtype titles within a circle denote those previously described by Khazina and Weichenrieder [11] and Kapitonov et al. [16]. Lineages and subgroups were identified by ORF1 structure and phylogenetic structuring based on the apurinic endonuclease (APE) and reverse transcriptase (RT) domains (see Figures 3, 4). Clades within lineages were identified by the RTclass1 tool [9]. The phylum and species are taken from the Repbase sequence title [17]. The ORF1 structure schematic shows coding domains 5’ to the endonuclease identified in this publication and are drawn to scale. Domains not always present are shown with a dashed outline. Red: CCHC, gag-like Cys2HisCys zinc-knuckle; green: CTD, C terminal domain; yellow: coiled-coil domain; purple: esterase; pink: PHD, plant homeodomain; blue: RRM, RNA recognition motif; lilac: zf/lz, zinc finger/leucine zipper. The hatched CC, RRM + CTD domains indicate transposase 22, the RCSB Protein Data Bank entry 2yko and Pfam entry PF02994. A key to all the domains is shown in Figure 6.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4120745&req=5

Figure 2: ORF1 types identified in the Jockey, CR1 and L2 lineages. Subtypes (A, B, C) are used to show the diversity of ORF1 structures within types identified in this paper. Subtype titles within a circle denote those previously described by Khazina and Weichenrieder [11] and Kapitonov et al. [16]. Lineages and subgroups were identified by ORF1 structure and phylogenetic structuring based on the apurinic endonuclease (APE) and reverse transcriptase (RT) domains (see Figures 3, 4). Clades within lineages were identified by the RTclass1 tool [9]. The phylum and species are taken from the Repbase sequence title [17]. The ORF1 structure schematic shows coding domains 5’ to the endonuclease identified in this publication and are drawn to scale. Domains not always present are shown with a dashed outline. Red: CCHC, gag-like Cys2HisCys zinc-knuckle; green: CTD, C terminal domain; yellow: coiled-coil domain; purple: esterase; pink: PHD, plant homeodomain; blue: RRM, RNA recognition motif; lilac: zf/lz, zinc finger/leucine zipper. The hatched CC, RRM + CTD domains indicate transposase 22, the RCSB Protein Data Bank entry 2yko and Pfam entry PF02994. A key to all the domains is shown in Figure 6.
Mentions: LINE superfamilies. Relationships between LINE superfamilies/groups and assignment of clades to superfamilies/groups based on reverse transcriptase (RT) phylogeny [7,9,10]. LINE clades were first assigned to five groups (R2, L1, RTE, I and Jockey) by Eickbush and Malik [8]. Groups are called superfamilies in the TE classification paper by Wicker et al. [7]. The ORF1 is not present in some RTE clades (shown with a dashed outline). In this paper, all Jockey superfamily/group full-length sequences from the Repbase database were assigned to three lineages based on an APE-RT phylogeny (see Figure 3). Subgroups were identified within the three lineages and ORF1 structures (see Figure 2) mapped onto the phylogeny (see Figures 4, 5 and 6).

Bottom Line: ORF1 type variations involving the PHD domain were found in many subgroups of the L2 and CR1 lineages.A Jockey lineage-like ORF1 with a PHD domain was found in both lineages.In conclusion, while the structure of the ORF2 appears to be highly constrained and its evolution tree-like, the structure of the ORF1 within the CR1 and L2 lineages is much more variable and its evolution reticulate.

View Article: PubMed Central - HTML - PubMed

Affiliation: Universidade de São Paulo, Instituto de Biociências, Rua do Matão 277, Cidade Universitária, São Paulo 05508-090 SP, Brazil.

ABSTRACT

Background: Long interspersed nuclear elements (LINES) are the most common transposable element (TE) in almost all metazoan genomes examined. In most LINE superfamilies there are two open reading frames (ORFs), and both are required for transposition. The ORF2 is well characterized, while the structure and function of the ORF1 is less well understood. ORF1s have been classified into five types based on structural organization and the domains identified. Here we perform a large scale analysis of ORF1 domains of 448 elements from the Jockey superfamily using multiple alignments and Hidden Markov Model (HMM)-HMM comparisons.

Results: Three major lineages, Chicken repeat 1 (CR1), LINE2 (L2) and Jockey, were identified. All Jockey lineage elements have the same type of ORF1. In contrast, in the L2 and CR1 lineage elements, all five ORF1 types are found, with no one type of ORF1 predominating. A plant homeodomain (PHD) is much more prevalent than previously suspected. ORF1 type variations involving the PHD domain were found in many subgroups of the L2 and CR1 lineages. A Jockey lineage-like ORF1 with a PHD domain was found in both lineages. A phylogenetic analysis of this ORF1 suggests that it has been horizontally transferred. Likewise, an esterase containing ORF1 type was only found in two exclusively vertebrate L2 and CR1 groups, indicating that it may have been acquired in a vertebrate common ancestor and then transferred between the lineages.

Conclusions: The ORF1 of the CR1 and L2 lineages is very structurally diverse. The presence of a PHD domain in many ORF1s of the L2 and CR1 lineages is suggestive of domain shuffling. There is also evidence of possible horizontal transfer of entire ORF1s between lineages. In conclusion, while the structure of the ORF2 appears to be highly constrained and its evolution tree-like, the structure of the ORF1 within the CR1 and L2 lineages is much more variable and its evolution reticulate.

No MeSH data available.


Related in: MedlinePlus