Limits...
Evolutionary conservation of orthoretroviral long terminal repeats (LTRs) and ab initio detection of single LTRs in genomic data.

Benachenhou F, Jern P, Oja M, Sperber G, Blikstad V, Somervuo P, Kaski S, Blomberg J - PLoS ONE (2009)

Bottom Line: By combining all HMMs with a low cutoff, for screening, 71% of all LTRs found by RepeatMasker in chromosome 19 were found.The modular conserved and redundant orthoretroviral LTR structure with three A-rich regions is reminiscent of structurally relaxed Giardia promoters.The five HMMs provided a novel broad range, repeat-independent, ab initio LTR detection, with prospects for greater generalisation, and insight into LTR structure, which may aid development of LTR-targeted pharmaceuticals.

View Article: PubMed Central - PubMed

Affiliation: Department of Medical Sciences, Section of Virology, Uppsala University, Uppsala, Sweden.

ABSTRACT

Background: Retroviral LTRs, paired or single, influence the transcription of both retroviral and non-retroviral genomic sequences. Vertebrate genomes contain many thousand endogenous retroviruses (ERVs) and their LTRs. Single LTRs are difficult to detect from genomic sequences without recourse to repetitiveness or presence in a proviral structure. Understanding of LTR structure increases understanding of LTR function, and of functional genomics. Here we develop models of orthoretroviral LTRs useful for detection in genomes and for structural analysis.

Principal findings: Although mutated, ERV LTRs are more numerous and diverse than exogenous retroviral (XRV) LTRs. Hidden Markov models (HMMs), and alignments based on them, were created for HML- (human MMTV-like), general-beta-, gamma- and lentiretroviruslike LTRs, plus a general-vertebrate LTR model. Training sets were XRV LTRs and RepBase LTR consensuses. The HML HMM was most sensitive and detected 87% of the HML LTRs in human chromosome 19 at 96% specificity. By combining all HMMs with a low cutoff, for screening, 71% of all LTRs found by RepeatMasker in chromosome 19 were found. HMM consensus sequences had a conserved modular LTR structure. Target site duplications (TG-CA), TATA (occasionally absent), an AATAAA box and a T-rich region were prominent features. Most of the conservation was located in, or adjacent to, R and U5, with evidence for stem loops. Several of the long HML LTRs contained long ORFs inserted after the second A rich module. HMM consensus alignment allowed comparison of functional features like transcriptional start sites (sense and antisense) between XRVs and ERVs.

Conclusion: The modular conserved and redundant orthoretroviral LTR structure with three A-rich regions is reminiscent of structurally relaxed Giardia promoters. The five HMMs provided a novel broad range, repeat-independent, ab initio LTR detection, with prospects for greater generalisation, and insight into LTR structure, which may aid development of LTR-targeted pharmaceuticals.

Show MeSH

Related in: MedlinePlus

Plot of the longest ORF detected in LTRs selected from all seven retroviral genera.ORFs longer than 97 amino acids were detected in MMTV sag, primate lentiviral nef, Bovine Foamy virus (BFV), and HML2,3,4 and 8 consensus LTRs. One standard deviation of the longest ORF occurring in 100 random sequences of increasing length is also shown. A. Longest ORF in RepBase consensus HML LTRs. B. Longest ORF in Clustal consensuses of alignable HML LTR groups and other LTRs (45 in total). Sequence identities and other details are given in Excel S2.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2664473&req=5

pone-0005179-g008: Plot of the longest ORF detected in LTRs selected from all seven retroviral genera.ORFs longer than 97 amino acids were detected in MMTV sag, primate lentiviral nef, Bovine Foamy virus (BFV), and HML2,3,4 and 8 consensus LTRs. One standard deviation of the longest ORF occurring in 100 random sequences of increasing length is also shown. A. Longest ORF in RepBase consensus HML LTRs. B. Longest ORF in Clustal consensuses of alignable HML LTR groups and other LTRs (45 in total). Sequence identities and other details are given in Excel S2.

Mentions: The occurrence of open reading frames longer than 100 amino acids in LTRs of 500–1000 nt is at the fringe of likelihood. A likelihood fringe of one standard deviation was calculated in a simulated set of random sequences of different length (100 random sequences for each length in increments of 100, from 200 nt to 1500 nt). Open reading frames outside of the likelihood fringe occurred in some of the LTRs (Fig 8; Excel S2) The 5′ third of primate lentiviral LTRs had a long ORF, encoding the nef protein. MMTV had the long sag ORF in the same position. Several of the long HML group consensus LTRs (HML2/LTR5, HML3/LTR9, HML4/LTR13 and HML8/MER11B/MER11C) harboured antisense ORFs (see Excel S2) which had a length exceeding one standard deviation of the longest ORF length per sequence in the random sequence set. MMTV sag, HIV/SIV nef and HML8 ORF were clearly outside the random zone. The HML ORFs were close to the 1 SD border. However, compared to other LTRs, HMLs were more often outside of the 1 SD border (Fig 8). None of the HML ORFs started with a methionine, which would have been expected. However, most of them were situated at the interface between the second A-rich block and the intermediate module. However, an HML3 ORF also overlapped with the intermediate module. If these ORFs were nonfunctional, occurring by chance, they should have occurred in random positions of the Viterbi alignments. The HML4 consensus sequence was remarkable in that it contained three ORFs longer than 100 amino acids, all situated at the abovementioned interface.


Evolutionary conservation of orthoretroviral long terminal repeats (LTRs) and ab initio detection of single LTRs in genomic data.

Benachenhou F, Jern P, Oja M, Sperber G, Blikstad V, Somervuo P, Kaski S, Blomberg J - PLoS ONE (2009)

Plot of the longest ORF detected in LTRs selected from all seven retroviral genera.ORFs longer than 97 amino acids were detected in MMTV sag, primate lentiviral nef, Bovine Foamy virus (BFV), and HML2,3,4 and 8 consensus LTRs. One standard deviation of the longest ORF occurring in 100 random sequences of increasing length is also shown. A. Longest ORF in RepBase consensus HML LTRs. B. Longest ORF in Clustal consensuses of alignable HML LTR groups and other LTRs (45 in total). Sequence identities and other details are given in Excel S2.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2664473&req=5

pone-0005179-g008: Plot of the longest ORF detected in LTRs selected from all seven retroviral genera.ORFs longer than 97 amino acids were detected in MMTV sag, primate lentiviral nef, Bovine Foamy virus (BFV), and HML2,3,4 and 8 consensus LTRs. One standard deviation of the longest ORF occurring in 100 random sequences of increasing length is also shown. A. Longest ORF in RepBase consensus HML LTRs. B. Longest ORF in Clustal consensuses of alignable HML LTR groups and other LTRs (45 in total). Sequence identities and other details are given in Excel S2.
Mentions: The occurrence of open reading frames longer than 100 amino acids in LTRs of 500–1000 nt is at the fringe of likelihood. A likelihood fringe of one standard deviation was calculated in a simulated set of random sequences of different length (100 random sequences for each length in increments of 100, from 200 nt to 1500 nt). Open reading frames outside of the likelihood fringe occurred in some of the LTRs (Fig 8; Excel S2) The 5′ third of primate lentiviral LTRs had a long ORF, encoding the nef protein. MMTV had the long sag ORF in the same position. Several of the long HML group consensus LTRs (HML2/LTR5, HML3/LTR9, HML4/LTR13 and HML8/MER11B/MER11C) harboured antisense ORFs (see Excel S2) which had a length exceeding one standard deviation of the longest ORF length per sequence in the random sequence set. MMTV sag, HIV/SIV nef and HML8 ORF were clearly outside the random zone. The HML ORFs were close to the 1 SD border. However, compared to other LTRs, HMLs were more often outside of the 1 SD border (Fig 8). None of the HML ORFs started with a methionine, which would have been expected. However, most of them were situated at the interface between the second A-rich block and the intermediate module. However, an HML3 ORF also overlapped with the intermediate module. If these ORFs were nonfunctional, occurring by chance, they should have occurred in random positions of the Viterbi alignments. The HML4 consensus sequence was remarkable in that it contained three ORFs longer than 100 amino acids, all situated at the abovementioned interface.

Bottom Line: By combining all HMMs with a low cutoff, for screening, 71% of all LTRs found by RepeatMasker in chromosome 19 were found.The modular conserved and redundant orthoretroviral LTR structure with three A-rich regions is reminiscent of structurally relaxed Giardia promoters.The five HMMs provided a novel broad range, repeat-independent, ab initio LTR detection, with prospects for greater generalisation, and insight into LTR structure, which may aid development of LTR-targeted pharmaceuticals.

View Article: PubMed Central - PubMed

Affiliation: Department of Medical Sciences, Section of Virology, Uppsala University, Uppsala, Sweden.

ABSTRACT

Background: Retroviral LTRs, paired or single, influence the transcription of both retroviral and non-retroviral genomic sequences. Vertebrate genomes contain many thousand endogenous retroviruses (ERVs) and their LTRs. Single LTRs are difficult to detect from genomic sequences without recourse to repetitiveness or presence in a proviral structure. Understanding of LTR structure increases understanding of LTR function, and of functional genomics. Here we develop models of orthoretroviral LTRs useful for detection in genomes and for structural analysis.

Principal findings: Although mutated, ERV LTRs are more numerous and diverse than exogenous retroviral (XRV) LTRs. Hidden Markov models (HMMs), and alignments based on them, were created for HML- (human MMTV-like), general-beta-, gamma- and lentiretroviruslike LTRs, plus a general-vertebrate LTR model. Training sets were XRV LTRs and RepBase LTR consensuses. The HML HMM was most sensitive and detected 87% of the HML LTRs in human chromosome 19 at 96% specificity. By combining all HMMs with a low cutoff, for screening, 71% of all LTRs found by RepeatMasker in chromosome 19 were found. HMM consensus sequences had a conserved modular LTR structure. Target site duplications (TG-CA), TATA (occasionally absent), an AATAAA box and a T-rich region were prominent features. Most of the conservation was located in, or adjacent to, R and U5, with evidence for stem loops. Several of the long HML LTRs contained long ORFs inserted after the second A rich module. HMM consensus alignment allowed comparison of functional features like transcriptional start sites (sense and antisense) between XRVs and ERVs.

Conclusion: The modular conserved and redundant orthoretroviral LTR structure with three A-rich regions is reminiscent of structurally relaxed Giardia promoters. The five HMMs provided a novel broad range, repeat-independent, ab initio LTR detection, with prospects for greater generalisation, and insight into LTR structure, which may aid development of LTR-targeted pharmaceuticals.

Show MeSH
Related in: MedlinePlus