Limits...
TE-Tracker: systematic identification of transposition events through whole-genome resequencing.

Gilly A, Etcheverry M, Madoui MA, Guy J, Quadrana L, Alberti A, Martin A, Heitkam T, Engelen S, Labadie K, Le Pen J, Wincker P, Colot V, Aury JM - BMC Bioinformatics (2014)

Bottom Line: As such, they are increasingly recognized as impacting all aspects of genome function.We show that TE-Tracker, while working independently of any prior annotation, bridges the gap between these two approaches in terms of detection power.Indeed, its positive predictive value (PPV) is comparable to that of dedicated TE software while its sensitivity is typical of a generic SV detection tool.

View Article: PubMed Central - PubMed

Affiliation: Commissariat a l'Energie Atomique (CEA), Institut de Genomique (IG), Genoscope, 2 rue Gaston Crémieux, BP5706, 91057, Evry, France. ag15@sanger.ac.uk.

ABSTRACT

Background: Transposable elements (TEs) are DNA sequences that are able to move from their location in the genome by cutting or copying themselves to another locus. As such, they are increasingly recognized as impacting all aspects of genome function. With the dramatic reduction in cost of DNA sequencing, it is now possible to resequence whole genomes in order to systematically characterize novel TE mobilization in a particular individual. However, this task is made difficult by the inherently repetitive nature of TE sequences, which in some eukaryotes compose over half of the genome sequence. Currently, only a few software tools dedicated to the detection of TE mobilization using next-generation-sequencing are described in the literature. They often target specific TEs for which annotation is available, and are only able to identify families of closely related TEs, rather than individual elements.

Results: We present TE-Tracker, a general and accurate computational method for the de-novo detection of germ line TE mobilization from re-sequenced genomes, as well as the identification of both their source and destination sequences. We compare our method with the two classes of existing software: specialized TE-detection tools and generic structural variant (SV) detection tools. We show that TE-Tracker, while working independently of any prior annotation, bridges the gap between these two approaches in terms of detection power. Indeed, its positive predictive value (PPV) is comparable to that of dedicated TE software while its sensitivity is typical of a generic SV detection tool. TE-Tracker demonstrates the benefit of adopting an annotation-independent, de novo approach for the detection of TE mobilization events. We use TE-Tracker to provide a comprehensive view of transposition events induced by loss of DNA methylation in Arabidopsis. TE-Tracker is freely available at http://www.genoscope.cns.fr/TE-Tracker .

Conclusions: We show that TE-Tracker accurately detects both the source and destination of novel transposition events in re-sequenced genomes. Moreover, TE-Tracker is able to detect all potential donor sequences for a given insertion, and can identify the correct one among them. Furthermore, TE-Tracker produces significantly fewer false positives than common SV detection programs, thus greatly facilitating the detection and analysis of TE mobilization events.

Show MeSH

Related in: MedlinePlus

Illustration of the donor-scoring algorithm. In this example we describe an event involving a TE copy that differs by only one base pair from another TE in the same family. Because multiple mappings are considered, most of the discordant reads anchored around the insertion locus will map on both candidate donors equally well (plain blue and plain red reads), which will result in TE-Tracker reporting both of them. However a fraction of the discordant reads (blue reads with red mark) will span the one divergent position that differentiates both copies. These reads will map on both locations as well, but their mapping quality score will be significantly higher on the true donor copy. Counting such reads for each donor allows TE-Tracker to quickly determine a “specificity score” for each candidate, therefore helping to determinate the probable true origin of the transposition event. For simplicity, only the multiple mappings of discordant pairs were represented on this figure.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4279814&req=5

Fig3: Illustration of the donor-scoring algorithm. In this example we describe an event involving a TE copy that differs by only one base pair from another TE in the same family. Because multiple mappings are considered, most of the discordant reads anchored around the insertion locus will map on both candidate donors equally well (plain blue and plain red reads), which will result in TE-Tracker reporting both of them. However a fraction of the discordant reads (blue reads with red mark) will span the one divergent position that differentiates both copies. These reads will map on both locations as well, but their mapping quality score will be significantly higher on the true donor copy. Counting such reads for each donor allows TE-Tracker to quickly determine a “specificity score” for each candidate, therefore helping to determinate the probable true origin of the transposition event. For simplicity, only the multiple mappings of discordant pairs were represented on this figure.

Mentions: Finally, it is possible to annotate the output file with various data using the Metis module. If annotation data is available, both the acceptor and donor regions can be annotated; this is performed using the readily available BEDTools software suite [25]. Metis is also able to read a discordant BAM file such as the one produced by the Eris module to perform donor-scoring. Since TE-Tracker analyzes all multiple mappings of discordant pairs, it is able to report all potential donor sites for a given transposition event. However, TE families typically contain mostly defective copies that are unable to be mobilized because of truncations or other mutations in their coding or regulatory sequences. Nonetheless, potentially mobile copies are difficult to predict on the basis of sequence integrity alone, and there are no programs to date that attempt to identify those that transpose among potential candidates. Given that TE families may contain several mobile copies that differ from each other by a few sequence polymorphisms, we have included in TE-Tracker a donor-scoring feature, which selects within clusters only those reads that contain discriminating polymorphisms (Figure 3). Discordant reads anchoring at the acceptor site on one side, and at every potential donor on the other, are extracted from the input alignment file. Reads that map indifferently to all the donors are discarded, while those that map significantly better on one donor than on all the others are assigned to that donor and subsequently counted. A better mapping score on one donor location indicates coverage of a polymorphism specific to that particular TE sequence, hence the count of those specific reads for each donor represents a “specificity” or “certainty score” for that particular acceptor/donor pair. This feature aims to provide evidence in identifying the “real” donor when several candidate are available. A donor with a higher score is generally synonymous with higher specificity for that particular copy, while in cases where all of the candidate TEs have highly similar sequences, their score will be uniformly low.Figure 3


TE-Tracker: systematic identification of transposition events through whole-genome resequencing.

Gilly A, Etcheverry M, Madoui MA, Guy J, Quadrana L, Alberti A, Martin A, Heitkam T, Engelen S, Labadie K, Le Pen J, Wincker P, Colot V, Aury JM - BMC Bioinformatics (2014)

Illustration of the donor-scoring algorithm. In this example we describe an event involving a TE copy that differs by only one base pair from another TE in the same family. Because multiple mappings are considered, most of the discordant reads anchored around the insertion locus will map on both candidate donors equally well (plain blue and plain red reads), which will result in TE-Tracker reporting both of them. However a fraction of the discordant reads (blue reads with red mark) will span the one divergent position that differentiates both copies. These reads will map on both locations as well, but their mapping quality score will be significantly higher on the true donor copy. Counting such reads for each donor allows TE-Tracker to quickly determine a “specificity score” for each candidate, therefore helping to determinate the probable true origin of the transposition event. For simplicity, only the multiple mappings of discordant pairs were represented on this figure.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4279814&req=5

Fig3: Illustration of the donor-scoring algorithm. In this example we describe an event involving a TE copy that differs by only one base pair from another TE in the same family. Because multiple mappings are considered, most of the discordant reads anchored around the insertion locus will map on both candidate donors equally well (plain blue and plain red reads), which will result in TE-Tracker reporting both of them. However a fraction of the discordant reads (blue reads with red mark) will span the one divergent position that differentiates both copies. These reads will map on both locations as well, but their mapping quality score will be significantly higher on the true donor copy. Counting such reads for each donor allows TE-Tracker to quickly determine a “specificity score” for each candidate, therefore helping to determinate the probable true origin of the transposition event. For simplicity, only the multiple mappings of discordant pairs were represented on this figure.
Mentions: Finally, it is possible to annotate the output file with various data using the Metis module. If annotation data is available, both the acceptor and donor regions can be annotated; this is performed using the readily available BEDTools software suite [25]. Metis is also able to read a discordant BAM file such as the one produced by the Eris module to perform donor-scoring. Since TE-Tracker analyzes all multiple mappings of discordant pairs, it is able to report all potential donor sites for a given transposition event. However, TE families typically contain mostly defective copies that are unable to be mobilized because of truncations or other mutations in their coding or regulatory sequences. Nonetheless, potentially mobile copies are difficult to predict on the basis of sequence integrity alone, and there are no programs to date that attempt to identify those that transpose among potential candidates. Given that TE families may contain several mobile copies that differ from each other by a few sequence polymorphisms, we have included in TE-Tracker a donor-scoring feature, which selects within clusters only those reads that contain discriminating polymorphisms (Figure 3). Discordant reads anchoring at the acceptor site on one side, and at every potential donor on the other, are extracted from the input alignment file. Reads that map indifferently to all the donors are discarded, while those that map significantly better on one donor than on all the others are assigned to that donor and subsequently counted. A better mapping score on one donor location indicates coverage of a polymorphism specific to that particular TE sequence, hence the count of those specific reads for each donor represents a “specificity” or “certainty score” for that particular acceptor/donor pair. This feature aims to provide evidence in identifying the “real” donor when several candidate are available. A donor with a higher score is generally synonymous with higher specificity for that particular copy, while in cases where all of the candidate TEs have highly similar sequences, their score will be uniformly low.Figure 3

Bottom Line: As such, they are increasingly recognized as impacting all aspects of genome function.We show that TE-Tracker, while working independently of any prior annotation, bridges the gap between these two approaches in terms of detection power.Indeed, its positive predictive value (PPV) is comparable to that of dedicated TE software while its sensitivity is typical of a generic SV detection tool.

View Article: PubMed Central - PubMed

Affiliation: Commissariat a l'Energie Atomique (CEA), Institut de Genomique (IG), Genoscope, 2 rue Gaston Crémieux, BP5706, 91057, Evry, France. ag15@sanger.ac.uk.

ABSTRACT

Background: Transposable elements (TEs) are DNA sequences that are able to move from their location in the genome by cutting or copying themselves to another locus. As such, they are increasingly recognized as impacting all aspects of genome function. With the dramatic reduction in cost of DNA sequencing, it is now possible to resequence whole genomes in order to systematically characterize novel TE mobilization in a particular individual. However, this task is made difficult by the inherently repetitive nature of TE sequences, which in some eukaryotes compose over half of the genome sequence. Currently, only a few software tools dedicated to the detection of TE mobilization using next-generation-sequencing are described in the literature. They often target specific TEs for which annotation is available, and are only able to identify families of closely related TEs, rather than individual elements.

Results: We present TE-Tracker, a general and accurate computational method for the de-novo detection of germ line TE mobilization from re-sequenced genomes, as well as the identification of both their source and destination sequences. We compare our method with the two classes of existing software: specialized TE-detection tools and generic structural variant (SV) detection tools. We show that TE-Tracker, while working independently of any prior annotation, bridges the gap between these two approaches in terms of detection power. Indeed, its positive predictive value (PPV) is comparable to that of dedicated TE software while its sensitivity is typical of a generic SV detection tool. TE-Tracker demonstrates the benefit of adopting an annotation-independent, de novo approach for the detection of TE mobilization events. We use TE-Tracker to provide a comprehensive view of transposition events induced by loss of DNA methylation in Arabidopsis. TE-Tracker is freely available at http://www.genoscope.cns.fr/TE-Tracker .

Conclusions: We show that TE-Tracker accurately detects both the source and destination of novel transposition events in re-sequenced genomes. Moreover, TE-Tracker is able to detect all potential donor sequences for a given insertion, and can identify the correct one among them. Furthermore, TE-Tracker produces significantly fewer false positives than common SV detection programs, thus greatly facilitating the detection and analysis of TE mobilization events.

Show MeSH
Related in: MedlinePlus