Limits...
TE-Tracker: systematic identification of transposition events through whole-genome resequencing.

Gilly A, Etcheverry M, Madoui MA, Guy J, Quadrana L, Alberti A, Martin A, Heitkam T, Engelen S, Labadie K, Le Pen J, Wincker P, Colot V, Aury JM - BMC Bioinformatics (2014)

Bottom Line: As such, they are increasingly recognized as impacting all aspects of genome function.We show that TE-Tracker, while working independently of any prior annotation, bridges the gap between these two approaches in terms of detection power.Indeed, its positive predictive value (PPV) is comparable to that of dedicated TE software while its sensitivity is typical of a generic SV detection tool.

View Article: PubMed Central - PubMed

Affiliation: Commissariat a l'Energie Atomique (CEA), Institut de Genomique (IG), Genoscope, 2 rue Gaston Crémieux, BP5706, 91057, Evry, France. ag15@sanger.ac.uk.

ABSTRACT

Background: Transposable elements (TEs) are DNA sequences that are able to move from their location in the genome by cutting or copying themselves to another locus. As such, they are increasingly recognized as impacting all aspects of genome function. With the dramatic reduction in cost of DNA sequencing, it is now possible to resequence whole genomes in order to systematically characterize novel TE mobilization in a particular individual. However, this task is made difficult by the inherently repetitive nature of TE sequences, which in some eukaryotes compose over half of the genome sequence. Currently, only a few software tools dedicated to the detection of TE mobilization using next-generation-sequencing are described in the literature. They often target specific TEs for which annotation is available, and are only able to identify families of closely related TEs, rather than individual elements.

Results: We present TE-Tracker, a general and accurate computational method for the de-novo detection of germ line TE mobilization from re-sequenced genomes, as well as the identification of both their source and destination sequences. We compare our method with the two classes of existing software: specialized TE-detection tools and generic structural variant (SV) detection tools. We show that TE-Tracker, while working independently of any prior annotation, bridges the gap between these two approaches in terms of detection power. Indeed, its positive predictive value (PPV) is comparable to that of dedicated TE software while its sensitivity is typical of a generic SV detection tool. TE-Tracker demonstrates the benefit of adopting an annotation-independent, de novo approach for the detection of TE mobilization events. We use TE-Tracker to provide a comprehensive view of transposition events induced by loss of DNA methylation in Arabidopsis. TE-Tracker is freely available at http://www.genoscope.cns.fr/TE-Tracker .

Conclusions: We show that TE-Tracker accurately detects both the source and destination of novel transposition events in re-sequenced genomes. Moreover, TE-Tracker is able to detect all potential donor sequences for a given insertion, and can identify the correct one among them. Furthermore, TE-Tracker produces significantly fewer false positives than common SV detection programs, thus greatly facilitating the detection and analysis of TE mobilization events.

Show MeSH

Related in: MedlinePlus

Gbrowse view of composite elements detected by TE-Tracker. Red dotted lines indicate the boundaries of the mobile sequence as detected by TE-Tracker. a. Element composed of two sequences annotated as ATENSPM3 and one annotated as HELITRON2. b. Element composed of two sequences annotated as ATENSPM3 and one annotated as ATLANTYS1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4279814&req=5

Fig5: Gbrowse view of composite elements detected by TE-Tracker. Red dotted lines indicate the boundaries of the mobile sequence as detected by TE-Tracker. a. Element composed of two sequences annotated as ATENSPM3 and one annotated as HELITRON2. b. Element composed of two sequences annotated as ATENSPM3 and one annotated as ATLANTYS1.

Mentions: We applied TE-Tracker to the identification of novel TE insertions in a set of four Arabidopsis epiRILs derived from a cross between a wild type (wt) plant and a mutant plant for the gene DECREASE IN DNA METHYLATION 1 (DDM1) [29]. DNA methylation as well as transcriptional silencing of TEs is severely compromised in ddm1 mutant plants [30], thus potentially leading to TE re-mobilization [31-34]. The four epiRILs together with one wt line were sequenced using Illumina mate-pair libraries (5.5 kb mean length), in order to enable the detection of new insertions for almost all of the TEs that are potentially active in the genome, as over 90% of all full-length annotated Arabidopsis TEs are less than 11 kb long [35,36]. Effective mean sequencing coverage (after alignment) ranged from 11X to 25X (Table 4). Results are illustrated in Figure 4 and summarized in Additional file 3: Table S3, Additional file 4: Table S4, Additional file 5: Table S5, Additional file 6: Table S6, Additional file 7: Table S7. Partial results obtained for several other epiRILs and using a beta version of TE-Tracker were reported elsewhere [37,38]. For the four epiRILs analyzed here, TE-Tracker could detect a total of 125 distinct insertions that match annotated TE sequences (Additional file 3: Table S3, Additional file 4: Table S4, Additional file 5: Table S5, Additional file 6: Table S6, column “Donor annotation”). The vast majority (119) of these insertions were not detected in the wt parental line, as expected if most transposition events occurred in the ddm1 parental line or during the propagation of the epiRILs (Additional file 7: Table S7). To validate these results, a random set of 68 potentially novel insertions as well as one insertion also shared with the wt parent were tested by PCR. In all 69 cases, the presence of the insertion could be confirmed (Additional file 8: Table S8), which provides further evidence of the high specificity of TE-Tracker. Furthermore, sequencing of 26 PCR products corresponding to new insertions was used to evaluate the performance of TE-tracker in identifying donor TEs. In all but one case, the donor-scoring module was able to identify the correct TE donor sequence. Also, sequencing of both ends of 12 new insertions confirmed the presence of a target site duplication in each case, as expected for true transposition events (Additional file 9: Figure S1, Additional file 8: Table S8). Among these, we validated several insertions involving composite sequences that were not previously annotated as full-length TE units (Figure 5). These results confirm that TE-Tracker is able to detect transposition events involving sequences not explicitly annotated as a single TE, which is currently impossible with annotation-based methods such as RetroSeq [17].Table 4


TE-Tracker: systematic identification of transposition events through whole-genome resequencing.

Gilly A, Etcheverry M, Madoui MA, Guy J, Quadrana L, Alberti A, Martin A, Heitkam T, Engelen S, Labadie K, Le Pen J, Wincker P, Colot V, Aury JM - BMC Bioinformatics (2014)

Gbrowse view of composite elements detected by TE-Tracker. Red dotted lines indicate the boundaries of the mobile sequence as detected by TE-Tracker. a. Element composed of two sequences annotated as ATENSPM3 and one annotated as HELITRON2. b. Element composed of two sequences annotated as ATENSPM3 and one annotated as ATLANTYS1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4279814&req=5

Fig5: Gbrowse view of composite elements detected by TE-Tracker. Red dotted lines indicate the boundaries of the mobile sequence as detected by TE-Tracker. a. Element composed of two sequences annotated as ATENSPM3 and one annotated as HELITRON2. b. Element composed of two sequences annotated as ATENSPM3 and one annotated as ATLANTYS1.
Mentions: We applied TE-Tracker to the identification of novel TE insertions in a set of four Arabidopsis epiRILs derived from a cross between a wild type (wt) plant and a mutant plant for the gene DECREASE IN DNA METHYLATION 1 (DDM1) [29]. DNA methylation as well as transcriptional silencing of TEs is severely compromised in ddm1 mutant plants [30], thus potentially leading to TE re-mobilization [31-34]. The four epiRILs together with one wt line were sequenced using Illumina mate-pair libraries (5.5 kb mean length), in order to enable the detection of new insertions for almost all of the TEs that are potentially active in the genome, as over 90% of all full-length annotated Arabidopsis TEs are less than 11 kb long [35,36]. Effective mean sequencing coverage (after alignment) ranged from 11X to 25X (Table 4). Results are illustrated in Figure 4 and summarized in Additional file 3: Table S3, Additional file 4: Table S4, Additional file 5: Table S5, Additional file 6: Table S6, Additional file 7: Table S7. Partial results obtained for several other epiRILs and using a beta version of TE-Tracker were reported elsewhere [37,38]. For the four epiRILs analyzed here, TE-Tracker could detect a total of 125 distinct insertions that match annotated TE sequences (Additional file 3: Table S3, Additional file 4: Table S4, Additional file 5: Table S5, Additional file 6: Table S6, column “Donor annotation”). The vast majority (119) of these insertions were not detected in the wt parental line, as expected if most transposition events occurred in the ddm1 parental line or during the propagation of the epiRILs (Additional file 7: Table S7). To validate these results, a random set of 68 potentially novel insertions as well as one insertion also shared with the wt parent were tested by PCR. In all 69 cases, the presence of the insertion could be confirmed (Additional file 8: Table S8), which provides further evidence of the high specificity of TE-Tracker. Furthermore, sequencing of 26 PCR products corresponding to new insertions was used to evaluate the performance of TE-tracker in identifying donor TEs. In all but one case, the donor-scoring module was able to identify the correct TE donor sequence. Also, sequencing of both ends of 12 new insertions confirmed the presence of a target site duplication in each case, as expected for true transposition events (Additional file 9: Figure S1, Additional file 8: Table S8). Among these, we validated several insertions involving composite sequences that were not previously annotated as full-length TE units (Figure 5). These results confirm that TE-Tracker is able to detect transposition events involving sequences not explicitly annotated as a single TE, which is currently impossible with annotation-based methods such as RetroSeq [17].Table 4

Bottom Line: As such, they are increasingly recognized as impacting all aspects of genome function.We show that TE-Tracker, while working independently of any prior annotation, bridges the gap between these two approaches in terms of detection power.Indeed, its positive predictive value (PPV) is comparable to that of dedicated TE software while its sensitivity is typical of a generic SV detection tool.

View Article: PubMed Central - PubMed

Affiliation: Commissariat a l'Energie Atomique (CEA), Institut de Genomique (IG), Genoscope, 2 rue Gaston Crémieux, BP5706, 91057, Evry, France. ag15@sanger.ac.uk.

ABSTRACT

Background: Transposable elements (TEs) are DNA sequences that are able to move from their location in the genome by cutting or copying themselves to another locus. As such, they are increasingly recognized as impacting all aspects of genome function. With the dramatic reduction in cost of DNA sequencing, it is now possible to resequence whole genomes in order to systematically characterize novel TE mobilization in a particular individual. However, this task is made difficult by the inherently repetitive nature of TE sequences, which in some eukaryotes compose over half of the genome sequence. Currently, only a few software tools dedicated to the detection of TE mobilization using next-generation-sequencing are described in the literature. They often target specific TEs for which annotation is available, and are only able to identify families of closely related TEs, rather than individual elements.

Results: We present TE-Tracker, a general and accurate computational method for the de-novo detection of germ line TE mobilization from re-sequenced genomes, as well as the identification of both their source and destination sequences. We compare our method with the two classes of existing software: specialized TE-detection tools and generic structural variant (SV) detection tools. We show that TE-Tracker, while working independently of any prior annotation, bridges the gap between these two approaches in terms of detection power. Indeed, its positive predictive value (PPV) is comparable to that of dedicated TE software while its sensitivity is typical of a generic SV detection tool. TE-Tracker demonstrates the benefit of adopting an annotation-independent, de novo approach for the detection of TE mobilization events. We use TE-Tracker to provide a comprehensive view of transposition events induced by loss of DNA methylation in Arabidopsis. TE-Tracker is freely available at http://www.genoscope.cns.fr/TE-Tracker .

Conclusions: We show that TE-Tracker accurately detects both the source and destination of novel transposition events in re-sequenced genomes. Moreover, TE-Tracker is able to detect all potential donor sequences for a given insertion, and can identify the correct one among them. Furthermore, TE-Tracker produces significantly fewer false positives than common SV detection programs, thus greatly facilitating the detection and analysis of TE mobilization events.

Show MeSH
Related in: MedlinePlus