Limits...
Contribution of type W human endogenous retroviruses to the human genome: characterization of HERV-W proviral insertions and processed pseudogenes

View Article: PubMed Central - PubMed

ABSTRACT

Background: Human endogenous retroviruses (HERVs) are ancient sequences integrated in the germ line cells and vertically transmitted through the offspring constituting about 8 % of our genome. In time, HERVs accumulated mutations that compromised their coding capacity. A prominent exception is HERV-W locus 7q21.2, producing a functional Env protein (Syncytin-1) coopted for placental syncytiotrophoblast formation. While expression of HERV-W sequences has been investigated for their correlation to disease, an exhaustive description of the group composition and characteristics is still not available and current HERV-W group information derive from studies published a few years ago that, of course, used the rough assemblies of the human genome available at that time. This hampers the comparison and correlation with current human genome assemblies.

Results: In the present work we identified and described in detail the distribution and genetic composition of 213 HERV-W elements. The bioinformatics analysis led to the characterization of several previously unreported features and provided a phylogenetic classification of two main subgroups with different age and structural characteristics. New facts on HERV-W genomic context of insertion and co-localization with sequences putatively involved in disease development are also reported.

Conclusions: The present work is a detailed overview of the HERV-W contribution to the human genome and provides a robust genetic background useful to clarify HERV-W role in pathologies with poorly understood etiology, representing, to our knowledge, the most complete and exhaustive HERV-W dataset up to date.

Electronic supplementary material: The online version of this article (doi:10.1186/s12977-016-0301-x) contains supplementary material, which is available to authorized users.

No MeSH data available.


Boxplot representations of HERV-W subgroups divergence based estimated period of integration. The approximated age (in million years) was calculated considering the divergence values between the 5′- and 3′ LTRs of the same provirus (only for proviral sequences); between each LTR and a generated consensus for each subgroup and between a 150–300 nucleotides region of each HERV-W internal element gag, pro, pol RT, pol IN and env genes and a generated consensus (proviruses and pseudogenes). a Averaged values of age obtained for each sequence, after the sequences division in proviruses and pseudogenes for each subgroup. b Single method estimations for the two HERV-W subgroups. c Highlight of the heterogeneous action of the divergence at different genic regions
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC5016936&req=5

Fig5: Boxplot representations of HERV-W subgroups divergence based estimated period of integration. The approximated age (in million years) was calculated considering the divergence values between the 5′- and 3′ LTRs of the same provirus (only for proviral sequences); between each LTR and a generated consensus for each subgroup and between a 150–300 nucleotides region of each HERV-W internal element gag, pro, pol RT, pol IN and env genes and a generated consensus (proviruses and pseudogenes). a Averaged values of age obtained for each sequence, after the sequences division in proviruses and pseudogenes for each subgroup. b Single method estimations for the two HERV-W subgroups. c Highlight of the heterogeneous action of the divergence at different genic regions

Mentions: The obtained divergence values were used to calculate the age of the HERV-W sequences. For all three approaches the calculation is based on the relation T = D %/0.13 %, where T is the estimated time of integration (in million years) and 0.13 % is the applied genomic substitution rate per million year. For the divergence between 5′- and 3′ LTR of the same sequence, the obtained T value was divided by a factor of 2, considering that each LTR evolved and accumulated mutations independently. The reported time of integration (Additional file 1: Table S1) has been calculated as the average resulted from the methods used (Fig. 5). In particular, the estimated time of integration of proviral and pseudogenic sequences for both subgroups 1 and 2 (Fig. 5a) describes for the first time the HERV-W dynamic of insertion into the human genome, suggesting that: (1) the first HERV-W integrations involved subgroup 2 and occurred more than 40 million years ago, with a diffusion of proviral and pseudogenic sequences until about 30 million years ago; (2) HERV-W subgroup 1 sequences are significantly younger with respect to subgroup 2 members (p < 0.0005), and have been acquired mostly between 35 and 25 million years ago, occurring in average about 8 million years later than subgroup 2; (3) it is interesting to note that, for both subgroups, the dissemination of proviruses and processed pseudogenes took place virtually simultaneously. Moreover, despite both subgroups proviruses were processed by the LINE machinery to generate processed pseudogenes, the mechanism was more frequent for subgroup 1 proviruses (1:2.5 ratio with the number of related pseudogenes) than for subgroup 2 integrated elements (1:1 ratio). The reason for this is at the moment unclear. We attempted to connect the single pseudogenic sequences to the original generating proviruses by a phylogenetic analysis of LTRs and major genes, expecting that the pseudogene elements could cluster with their respective HERV-W locus of origin. However, the great majority of pseudogenes clustered with different proviral loci according to the sequence portion considered (data not shown). Hence, this result, together with the estimated time of diffusion of pseudogenic elements, suggests that the HERV-W processed pseudogenes have acquired a comparable amount of heterogeneity since their mobilization by LINE elements, and it is thus not possible to univocally assign each one to a single proviral locus.Fig. 5


Contribution of type W human endogenous retroviruses to the human genome: characterization of HERV-W proviral insertions and processed pseudogenes
Boxplot representations of HERV-W subgroups divergence based estimated period of integration. The approximated age (in million years) was calculated considering the divergence values between the 5′- and 3′ LTRs of the same provirus (only for proviral sequences); between each LTR and a generated consensus for each subgroup and between a 150–300 nucleotides region of each HERV-W internal element gag, pro, pol RT, pol IN and env genes and a generated consensus (proviruses and pseudogenes). a Averaged values of age obtained for each sequence, after the sequences division in proviruses and pseudogenes for each subgroup. b Single method estimations for the two HERV-W subgroups. c Highlight of the heterogeneous action of the divergence at different genic regions
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC5016936&req=5

Fig5: Boxplot representations of HERV-W subgroups divergence based estimated period of integration. The approximated age (in million years) was calculated considering the divergence values between the 5′- and 3′ LTRs of the same provirus (only for proviral sequences); between each LTR and a generated consensus for each subgroup and between a 150–300 nucleotides region of each HERV-W internal element gag, pro, pol RT, pol IN and env genes and a generated consensus (proviruses and pseudogenes). a Averaged values of age obtained for each sequence, after the sequences division in proviruses and pseudogenes for each subgroup. b Single method estimations for the two HERV-W subgroups. c Highlight of the heterogeneous action of the divergence at different genic regions
Mentions: The obtained divergence values were used to calculate the age of the HERV-W sequences. For all three approaches the calculation is based on the relation T = D %/0.13 %, where T is the estimated time of integration (in million years) and 0.13 % is the applied genomic substitution rate per million year. For the divergence between 5′- and 3′ LTR of the same sequence, the obtained T value was divided by a factor of 2, considering that each LTR evolved and accumulated mutations independently. The reported time of integration (Additional file 1: Table S1) has been calculated as the average resulted from the methods used (Fig. 5). In particular, the estimated time of integration of proviral and pseudogenic sequences for both subgroups 1 and 2 (Fig. 5a) describes for the first time the HERV-W dynamic of insertion into the human genome, suggesting that: (1) the first HERV-W integrations involved subgroup 2 and occurred more than 40 million years ago, with a diffusion of proviral and pseudogenic sequences until about 30 million years ago; (2) HERV-W subgroup 1 sequences are significantly younger with respect to subgroup 2 members (p < 0.0005), and have been acquired mostly between 35 and 25 million years ago, occurring in average about 8 million years later than subgroup 2; (3) it is interesting to note that, for both subgroups, the dissemination of proviruses and processed pseudogenes took place virtually simultaneously. Moreover, despite both subgroups proviruses were processed by the LINE machinery to generate processed pseudogenes, the mechanism was more frequent for subgroup 1 proviruses (1:2.5 ratio with the number of related pseudogenes) than for subgroup 2 integrated elements (1:1 ratio). The reason for this is at the moment unclear. We attempted to connect the single pseudogenic sequences to the original generating proviruses by a phylogenetic analysis of LTRs and major genes, expecting that the pseudogene elements could cluster with their respective HERV-W locus of origin. However, the great majority of pseudogenes clustered with different proviral loci according to the sequence portion considered (data not shown). Hence, this result, together with the estimated time of diffusion of pseudogenic elements, suggests that the HERV-W processed pseudogenes have acquired a comparable amount of heterogeneity since their mobilization by LINE elements, and it is thus not possible to univocally assign each one to a single proviral locus.Fig. 5

View Article: PubMed Central - PubMed

ABSTRACT

Background: Human endogenous retroviruses (HERVs) are ancient sequences integrated in the germ line cells and vertically transmitted through the offspring constituting about 8&nbsp;% of our genome. In time, HERVs accumulated mutations that compromised their coding capacity. A prominent exception is HERV-W locus 7q21.2, producing a functional Env protein (Syncytin-1) coopted for placental syncytiotrophoblast formation. While expression of HERV-W sequences has been investigated for their correlation to disease, an exhaustive description of the group composition and characteristics is still not available and current HERV-W group information derive from studies published a few years ago that, of course, used the rough assemblies of the human genome available at that time. This hampers the comparison and correlation with current human genome assemblies.

Results: In the present work we identified and described in detail the distribution and genetic composition of 213 HERV-W elements. The bioinformatics analysis led to the characterization of several previously unreported features and provided a phylogenetic classification of two main subgroups with different age and structural characteristics. New facts on HERV-W genomic context of insertion and co-localization with sequences putatively involved in disease development are also reported.

Conclusions: The present work is a detailed overview of the HERV-W contribution to the human genome and provides a robust genetic background useful to clarify HERV-W role in pathologies with poorly understood etiology, representing, to our knowledge, the most complete and exhaustive HERV-W dataset up to date.

Electronic supplementary material: The online version of this article (doi:10.1186/s12977-016-0301-x) contains supplementary material, which is available to authorized users.

No MeSH data available.