Limits...
proTRAC--a software for probabilistic piRNA cluster detection, visualization and analysis.

Rosenkranz D, Zischler H - BMC Bioinformatics (2012)

Bottom Line: We further show, that proTRAC detects clusters that are passed over if a minimum number of single-copy piRNA loci are required and that proTRAC assigns more sequence reads per cluster since it does not preclude frequently mapped reads from the analysis.With proTRAC we provide a reliable tool for detection, visualization and analysis of piRNA clusters.Detected clusters are well supported by comprehensible probabilistic parameters and retain a maximum amount of information, thus overcoming the present conflict of sensitivity and specificity in piRNA cluster detection.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Anthropology, Johannes Gutenberg-University Mainz, Colonel-Kleinmann-Weg 2, 55099 Mainz, Germany. rosenkrd@uni-mainz.de

ABSTRACT

Background: Throughout the metazoan lineage, typically gonadal expressed Piwi proteins and their guiding piRNAs (~26-32nt in length) form a protective mechanism of RNA interference directed against the propagation of transposable elements (TEs). Most piRNAs are generated from genomic piRNA clusters. Annotation of experimentally obtained piRNAs from small RNA/cDNA-libraries and detection of genomic piRNA clusters are crucial for a thorough understanding of the still enigmatic piRNA pathway, especially in an evolutionary context. Currently, detection of piRNA clusters relies on bioinformatics rather than detection and sequencing of primary piRNA cluster transcripts and the stringency of the methods applied in different studies differs considerably. Additionally, not all important piRNA cluster characteristics were taken into account during bioinformatic processing. Depending on the applied method this can lead to: i) an accidentally underrepresentation of TE related piRNAs, ii) overlook duplicated clusters harboring few or no single-copy loci and iii) false positive annotation of clusters that are in fact just accumulations of multi-copy loci corresponding to frequently mapped reads, but are not transcribed to piRNA precursors.

Results: We developed a software which detects and analyses piRNA clusters (proTRAC, probabilistic TRacking and Analysis of Clusters) based on quantifiable deviations from a hypothetical uniform distribution regarding the decisive piRNA cluster characteristics. We used piRNA sequences from human, macaque, mouse and rat to identify piRNA clusters in the respective species with proTRAC and compared the obtained results with piRNA cluster annotation from piRNABank and the results generated by different hitherto applied methods.proTRAC identified clusters not annotated at piRNABank and rejected annotated clusters based on the absence of important features like strand asymmetry. We further show, that proTRAC detects clusters that are passed over if a minimum number of single-copy piRNA loci are required and that proTRAC assigns more sequence reads per cluster since it does not preclude frequently mapped reads from the analysis.

Conclusions: With proTRAC we provide a reliable tool for detection, visualization and analysis of piRNA clusters. Detected clusters are well supported by comprehensible probabilistic parameters and retain a maximum amount of information, thus overcoming the present conflict of sensitivity and specificity in piRNA cluster detection.

Show MeSH

Related in: MedlinePlus

Determining the minimum number of piRNA loci per kb. Pn (blue points) refers to the probability to observe n hits (n∈ℕ) per kb. f(x) (yellow line) defines the function which extends the probabilities to positive rational numbers. P(x≥k) is the probability to observe at least k hits per kb (k∈ℝ). r = number of putative piRNA loci/size of piRNA encoding region [nt].
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3293768&req=5

Figure 1: Determining the minimum number of piRNA loci per kb. Pn (blue points) refers to the probability to observe n hits (n∈ℕ) per kb. f(x) (yellow line) defines the function which extends the probabilities to positive rational numbers. P(x≥k) is the probability to observe at least k hits per kb (k∈ℝ). r = number of putative piRNA loci/size of piRNA encoding region [nt].

Mentions: Accepted input files are specially formatted SeqMap output files (ELAND3 format) which can be obtained by running SeqMap with the option/output_all_matches. ELAND3 files contain a list of mapped sequences and associated coordinates in genome coordinate order, each line corresponding to one mapped sequence read. proTRAC basically operates with a sliding window while reading the input file. The window size in lines is given by the required minimum number of mapped sequence reads per cluster. If the region encompassed by the first and last coordinate of the sliding window is small enough to exceed the minimum density of mapped reads (hit density), the respective loci are tagged. Contiguously tagged loci are assembled to cluster candidates being subject to a subsequent verification process. The required minimum number of putative piRNA loci (= mapped reads) and minimum hit density depend upon the provided dataset and the stated minimum score values, which are explained in more detail below. In order to determine the minimum hit density, proTRAC examines the distribution of mapped sequence reads for each chromosome or scaffold and specifically calculates a significant hit density by stepwise computation of the probability to observe an increasing number of hits within a 1kb window on the given chromosome or scaffold assuming a uniform distribution of mapped sequences (figure 1).


proTRAC--a software for probabilistic piRNA cluster detection, visualization and analysis.

Rosenkranz D, Zischler H - BMC Bioinformatics (2012)

Determining the minimum number of piRNA loci per kb. Pn (blue points) refers to the probability to observe n hits (n∈ℕ) per kb. f(x) (yellow line) defines the function which extends the probabilities to positive rational numbers. P(x≥k) is the probability to observe at least k hits per kb (k∈ℝ). r = number of putative piRNA loci/size of piRNA encoding region [nt].
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3293768&req=5

Figure 1: Determining the minimum number of piRNA loci per kb. Pn (blue points) refers to the probability to observe n hits (n∈ℕ) per kb. f(x) (yellow line) defines the function which extends the probabilities to positive rational numbers. P(x≥k) is the probability to observe at least k hits per kb (k∈ℝ). r = number of putative piRNA loci/size of piRNA encoding region [nt].
Mentions: Accepted input files are specially formatted SeqMap output files (ELAND3 format) which can be obtained by running SeqMap with the option/output_all_matches. ELAND3 files contain a list of mapped sequences and associated coordinates in genome coordinate order, each line corresponding to one mapped sequence read. proTRAC basically operates with a sliding window while reading the input file. The window size in lines is given by the required minimum number of mapped sequence reads per cluster. If the region encompassed by the first and last coordinate of the sliding window is small enough to exceed the minimum density of mapped reads (hit density), the respective loci are tagged. Contiguously tagged loci are assembled to cluster candidates being subject to a subsequent verification process. The required minimum number of putative piRNA loci (= mapped reads) and minimum hit density depend upon the provided dataset and the stated minimum score values, which are explained in more detail below. In order to determine the minimum hit density, proTRAC examines the distribution of mapped sequence reads for each chromosome or scaffold and specifically calculates a significant hit density by stepwise computation of the probability to observe an increasing number of hits within a 1kb window on the given chromosome or scaffold assuming a uniform distribution of mapped sequences (figure 1).

Bottom Line: We further show, that proTRAC detects clusters that are passed over if a minimum number of single-copy piRNA loci are required and that proTRAC assigns more sequence reads per cluster since it does not preclude frequently mapped reads from the analysis.With proTRAC we provide a reliable tool for detection, visualization and analysis of piRNA clusters.Detected clusters are well supported by comprehensible probabilistic parameters and retain a maximum amount of information, thus overcoming the present conflict of sensitivity and specificity in piRNA cluster detection.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Anthropology, Johannes Gutenberg-University Mainz, Colonel-Kleinmann-Weg 2, 55099 Mainz, Germany. rosenkrd@uni-mainz.de

ABSTRACT

Background: Throughout the metazoan lineage, typically gonadal expressed Piwi proteins and their guiding piRNAs (~26-32nt in length) form a protective mechanism of RNA interference directed against the propagation of transposable elements (TEs). Most piRNAs are generated from genomic piRNA clusters. Annotation of experimentally obtained piRNAs from small RNA/cDNA-libraries and detection of genomic piRNA clusters are crucial for a thorough understanding of the still enigmatic piRNA pathway, especially in an evolutionary context. Currently, detection of piRNA clusters relies on bioinformatics rather than detection and sequencing of primary piRNA cluster transcripts and the stringency of the methods applied in different studies differs considerably. Additionally, not all important piRNA cluster characteristics were taken into account during bioinformatic processing. Depending on the applied method this can lead to: i) an accidentally underrepresentation of TE related piRNAs, ii) overlook duplicated clusters harboring few or no single-copy loci and iii) false positive annotation of clusters that are in fact just accumulations of multi-copy loci corresponding to frequently mapped reads, but are not transcribed to piRNA precursors.

Results: We developed a software which detects and analyses piRNA clusters (proTRAC, probabilistic TRacking and Analysis of Clusters) based on quantifiable deviations from a hypothetical uniform distribution regarding the decisive piRNA cluster characteristics. We used piRNA sequences from human, macaque, mouse and rat to identify piRNA clusters in the respective species with proTRAC and compared the obtained results with piRNA cluster annotation from piRNABank and the results generated by different hitherto applied methods.proTRAC identified clusters not annotated at piRNABank and rejected annotated clusters based on the absence of important features like strand asymmetry. We further show, that proTRAC detects clusters that are passed over if a minimum number of single-copy piRNA loci are required and that proTRAC assigns more sequence reads per cluster since it does not preclude frequently mapped reads from the analysis.

Conclusions: With proTRAC we provide a reliable tool for detection, visualization and analysis of piRNA clusters. Detected clusters are well supported by comprehensible probabilistic parameters and retain a maximum amount of information, thus overcoming the present conflict of sensitivity and specificity in piRNA cluster detection.

Show MeSH
Related in: MedlinePlus