Limits...
RetroTector online, a rational tool for analysis of retroviral elements in small and medium size vertebrate genomic sequences.

Sperber G, Lövgren A, Eriksson NE, Benachenhou F, Blomberg J - BMC Bioinformatics (2009)

Bottom Line: A better understanding of structure and function of these sequences can have profound biological and medical consequences.A "light" version of ReTe, (RetroTector online; ROL) which does not require specific installation procedures is provided, via the World Wide Web.A detailed analysis of any retroviral sequences found in the submitted sequence is graphically presented, exportable in standard formats.

View Article: PubMed Central - HTML - PubMed

Affiliation: Physiology unit, Department of Neuroscience, Box 593, Uppsala, Sweden. goran.sperber@neuro.uu.se

ABSTRACT

Background: The rapid accumulation of genomic information in databases necessitates rapid and specific algorithms for extracting biologically meaningful information. More or less complete retroviral sequences, also called proviral or endogenous retroviral sequences; ERVs, constitutes at least 5% of vertebrate genomes. After infecting the host, these retroviruses have integrated in germ line cells, and have then been carried in genomes for at least several 100 million years. A better understanding of structure and function of these sequences can have profound biological and medical consequences.

Methods: RetroTector (ReTe) is a platform-independent Java program for identification and characterization of proviral sequences in vertebrate genomes. The full ReTe requires a local installation with a MySQL database. Although not overly complicated, the installation may take some time. A "light" version of ReTe, (RetroTector online; ROL) which does not require specific installation procedures is provided, via the World Wide Web.

Result: ROL http://www.fysiologi.neuro.uu.se/jbgs/ was implemented under the Batchelor web interface (A Lövgren et al). It allows both GenBank accession number, file and FASTA cut-and-paste admission of sequences (5 to 10,000 kilobases). Up to ten submissions can be done simultaneously, allowing batch analysis of

Discussion: Proviral sequences can be hard to recognize, especially if the integration occurred many million years ago. Precise delineation of LTR, gag, pro, pol and env can be difficult, requiring manual work. ROL is a way of simplifying these tasks.

Conclusion: ROL provides 1. annotation and presentation of known retroviral sequences, 2. detection of proviral chains in unknown genomic sequences, with up to 100 Mbase per submission.

Show MeSH
Result from a run of the chimpanzee genome (panTro2). The chimpanzee version of the human HERVFc1 retroviral sequence [3] is shown. The components LTR (5 LT & 3 LT), primer binding site (PBS), gag (internal structural proteins; motifs named CA and NC), pro (protease, motifs named PR), pol (pol gene, motifs named RT, RH and IN), env (envelope gene, motifs named SU and TM), and polypurine tract (PPT) are shown. Figures above each motif denote its score (0–100). The following features are also predicted: Red bars denote stop codons, blue bars start codons. Triple lines denote putative protein encoding sequences. Green bars denote putative asparagine glycosylation sites, "/" splice donor, "\ " splice acceptor, "S" "Slippery" sequence (possible frameshift sites), and "8" pseudoknot sequences (which are also possible frameshift sites).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2697651&req=5

Figure 2: Result from a run of the chimpanzee genome (panTro2). The chimpanzee version of the human HERVFc1 retroviral sequence [3] is shown. The components LTR (5 LT & 3 LT), primer binding site (PBS), gag (internal structural proteins; motifs named CA and NC), pro (protease, motifs named PR), pol (pol gene, motifs named RT, RH and IN), env (envelope gene, motifs named SU and TM), and polypurine tract (PPT) are shown. Figures above each motif denote its score (0–100). The following features are also predicted: Red bars denote stop codons, blue bars start codons. Triple lines denote putative protein encoding sequences. Green bars denote putative asparagine glycosylation sites, "/" splice donor, "\ " splice acceptor, "S" "Slippery" sequence (possible frameshift sites), and "8" pseudoknot sequences (which are also possible frameshift sites).

Mentions: An example (Figure 2) is the analysis of ERVFc1, a gammaretroviruslike sequence which has several ORFs and near-ORFs [1,3]. This low copy number sequence was present in previous assemblies of the human genome, but has been edited out of the hg18 assembly. The provirus shown is from the panTro2 chimpanzee genome assembly. This illustrates the importance of independent retrovirus sequence detection systems, like ReTe and ROL.


RetroTector online, a rational tool for analysis of retroviral elements in small and medium size vertebrate genomic sequences.

Sperber G, Lövgren A, Eriksson NE, Benachenhou F, Blomberg J - BMC Bioinformatics (2009)

Result from a run of the chimpanzee genome (panTro2). The chimpanzee version of the human HERVFc1 retroviral sequence [3] is shown. The components LTR (5 LT & 3 LT), primer binding site (PBS), gag (internal structural proteins; motifs named CA and NC), pro (protease, motifs named PR), pol (pol gene, motifs named RT, RH and IN), env (envelope gene, motifs named SU and TM), and polypurine tract (PPT) are shown. Figures above each motif denote its score (0–100). The following features are also predicted: Red bars denote stop codons, blue bars start codons. Triple lines denote putative protein encoding sequences. Green bars denote putative asparagine glycosylation sites, "/" splice donor, "\ " splice acceptor, "S" "Slippery" sequence (possible frameshift sites), and "8" pseudoknot sequences (which are also possible frameshift sites).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2697651&req=5

Figure 2: Result from a run of the chimpanzee genome (panTro2). The chimpanzee version of the human HERVFc1 retroviral sequence [3] is shown. The components LTR (5 LT & 3 LT), primer binding site (PBS), gag (internal structural proteins; motifs named CA and NC), pro (protease, motifs named PR), pol (pol gene, motifs named RT, RH and IN), env (envelope gene, motifs named SU and TM), and polypurine tract (PPT) are shown. Figures above each motif denote its score (0–100). The following features are also predicted: Red bars denote stop codons, blue bars start codons. Triple lines denote putative protein encoding sequences. Green bars denote putative asparagine glycosylation sites, "/" splice donor, "\ " splice acceptor, "S" "Slippery" sequence (possible frameshift sites), and "8" pseudoknot sequences (which are also possible frameshift sites).
Mentions: An example (Figure 2) is the analysis of ERVFc1, a gammaretroviruslike sequence which has several ORFs and near-ORFs [1,3]. This low copy number sequence was present in previous assemblies of the human genome, but has been edited out of the hg18 assembly. The provirus shown is from the panTro2 chimpanzee genome assembly. This illustrates the importance of independent retrovirus sequence detection systems, like ReTe and ROL.

Bottom Line: A better understanding of structure and function of these sequences can have profound biological and medical consequences.A "light" version of ReTe, (RetroTector online; ROL) which does not require specific installation procedures is provided, via the World Wide Web.A detailed analysis of any retroviral sequences found in the submitted sequence is graphically presented, exportable in standard formats.

View Article: PubMed Central - HTML - PubMed

Affiliation: Physiology unit, Department of Neuroscience, Box 593, Uppsala, Sweden. goran.sperber@neuro.uu.se

ABSTRACT

Background: The rapid accumulation of genomic information in databases necessitates rapid and specific algorithms for extracting biologically meaningful information. More or less complete retroviral sequences, also called proviral or endogenous retroviral sequences; ERVs, constitutes at least 5% of vertebrate genomes. After infecting the host, these retroviruses have integrated in germ line cells, and have then been carried in genomes for at least several 100 million years. A better understanding of structure and function of these sequences can have profound biological and medical consequences.

Methods: RetroTector (ReTe) is a platform-independent Java program for identification and characterization of proviral sequences in vertebrate genomes. The full ReTe requires a local installation with a MySQL database. Although not overly complicated, the installation may take some time. A "light" version of ReTe, (RetroTector online; ROL) which does not require specific installation procedures is provided, via the World Wide Web.

Result: ROL http://www.fysiologi.neuro.uu.se/jbgs/ was implemented under the Batchelor web interface (A Lövgren et al). It allows both GenBank accession number, file and FASTA cut-and-paste admission of sequences (5 to 10,000 kilobases). Up to ten submissions can be done simultaneously, allowing batch analysis of

Discussion: Proviral sequences can be hard to recognize, especially if the integration occurred many million years ago. Precise delineation of LTR, gag, pro, pol and env can be difficult, requiring manual work. ROL is a way of simplifying these tasks.

Conclusion: ROL provides 1. annotation and presentation of known retroviral sequences, 2. detection of proviral chains in unknown genomic sequences, with up to 100 Mbase per submission.

Show MeSH