Limits...
Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data.

Comoglio F, Sievers C, Paro R - BMC Bioinformatics (2015)

Bottom Line: We use published PAR-CLIP data to demonstrate the advantages of our approach, which compares favorably with alternative algorithms.Lastly, by integrating RNA-Seq data we compute conservative experimentally-based false discovery rates of our method and demonstrate the high precision of our strategy.Our method is implemented in the R package wavClusteR 2.0.

View Article: PubMed Central - PubMed

Affiliation: Department of Biosystems Science and Engineering, Swiss Federal Institute of Technology Zurich, Mattenstrasse 26, Basel, 4058, Switzerland. federico.comoglio@bsse.ethz.ch.

ABSTRACT

Background: PAR-CLIP is a recently developed Next Generation Sequencing-based method enabling transcriptome-wide identification of interaction sites between RNA and RNA-binding proteins. The PAR-CLIP procedure induces specific base transitions that originate from sites of RNA-protein interactions and can therefore guide the identification of binding sites. However, additional sources of transitions, such as cell type-specific SNPs and sequencing errors, challenge the inference of binding sites and suitable statistical approaches are crucial to control false discovery rates. In addition, a highly resolved delineation of binding sites followed by an extensive downstream analysis is necessary for a comprehensive characterization of the protein binding preferences and the subsequent design of validation experiments.

Results: We present a statistical and computational framework for PAR-CLIP data analysis. We developed a sensitive transition-centered algorithm specifically designed to resolve protein binding sites at high resolution in PAR-CLIP data. Our method employes a Bayesian network approach to associate posterior log-odds with the observed transitions, providing an overall quantification of the confidence in RNA-protein interaction. We use published PAR-CLIP data to demonstrate the advantages of our approach, which compares favorably with alternative algorithms. Lastly, by integrating RNA-Seq data we compute conservative experimentally-based false discovery rates of our method and demonstrate the high precision of our strategy.

Conclusions: Our method is implemented in the R package wavClusteR 2.0. The package is distributed under the GPL-2 license and is available from BioConductor at http://www.bioconductor.org/packages/devel/bioc/html/wavClusteR.html .

Show MeSH
Post-processing of binding sites identified in the MOV10 and QKI data sets.(A) Annotation of MOV10 and QKI clusters with respect to the sense and antisense strand, respectively (top). The distribution of different transcript features in the human transcriptome (hg19, bottom left) is used to compute the normalized annotation profile for clusters mapping on the sense strand (bottom right). (B) Corresponding metagene profiles of MOV10 and QKI clusters.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4339748&req=5

Fig3: Post-processing of binding sites identified in the MOV10 and QKI data sets.(A) Annotation of MOV10 and QKI clusters with respect to the sense and antisense strand, respectively (top). The distribution of different transcript features in the human transcriptome (hg19, bottom left) is used to compute the normalized annotation profile for clusters mapping on the sense strand (bottom right). (B) Corresponding metagene profiles of MOV10 and QKI clusters.

Mentions: For illustration, we provide examples of cluster annotations and metagene profiles obtained from PAR-CLIP data sets of MOV10 and QKI, which are characterized by different binding preferences. Annotation of MOV10 clusters shows that MOV10 preferentially binds to 3’-UTRs of transcripts [11] (Figure 3A), whereas binding sites of QKI, which regulates pre-mRNA splicing, mRNA export and stability, and protein translation [19], are enriched in 3’-UTRs, coding sequences and introns (Figure 3A). Notably, the distinct binding preferences of the two proteins are neatly reflected in their metagene profiles (Figure 3B).Figure 3


Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data.

Comoglio F, Sievers C, Paro R - BMC Bioinformatics (2015)

Post-processing of binding sites identified in the MOV10 and QKI data sets.(A) Annotation of MOV10 and QKI clusters with respect to the sense and antisense strand, respectively (top). The distribution of different transcript features in the human transcriptome (hg19, bottom left) is used to compute the normalized annotation profile for clusters mapping on the sense strand (bottom right). (B) Corresponding metagene profiles of MOV10 and QKI clusters.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4339748&req=5

Fig3: Post-processing of binding sites identified in the MOV10 and QKI data sets.(A) Annotation of MOV10 and QKI clusters with respect to the sense and antisense strand, respectively (top). The distribution of different transcript features in the human transcriptome (hg19, bottom left) is used to compute the normalized annotation profile for clusters mapping on the sense strand (bottom right). (B) Corresponding metagene profiles of MOV10 and QKI clusters.
Mentions: For illustration, we provide examples of cluster annotations and metagene profiles obtained from PAR-CLIP data sets of MOV10 and QKI, which are characterized by different binding preferences. Annotation of MOV10 clusters shows that MOV10 preferentially binds to 3’-UTRs of transcripts [11] (Figure 3A), whereas binding sites of QKI, which regulates pre-mRNA splicing, mRNA export and stability, and protein translation [19], are enriched in 3’-UTRs, coding sequences and introns (Figure 3A). Notably, the distinct binding preferences of the two proteins are neatly reflected in their metagene profiles (Figure 3B).Figure 3

Bottom Line: We use published PAR-CLIP data to demonstrate the advantages of our approach, which compares favorably with alternative algorithms.Lastly, by integrating RNA-Seq data we compute conservative experimentally-based false discovery rates of our method and demonstrate the high precision of our strategy.Our method is implemented in the R package wavClusteR 2.0.

View Article: PubMed Central - PubMed

Affiliation: Department of Biosystems Science and Engineering, Swiss Federal Institute of Technology Zurich, Mattenstrasse 26, Basel, 4058, Switzerland. federico.comoglio@bsse.ethz.ch.

ABSTRACT

Background: PAR-CLIP is a recently developed Next Generation Sequencing-based method enabling transcriptome-wide identification of interaction sites between RNA and RNA-binding proteins. The PAR-CLIP procedure induces specific base transitions that originate from sites of RNA-protein interactions and can therefore guide the identification of binding sites. However, additional sources of transitions, such as cell type-specific SNPs and sequencing errors, challenge the inference of binding sites and suitable statistical approaches are crucial to control false discovery rates. In addition, a highly resolved delineation of binding sites followed by an extensive downstream analysis is necessary for a comprehensive characterization of the protein binding preferences and the subsequent design of validation experiments.

Results: We present a statistical and computational framework for PAR-CLIP data analysis. We developed a sensitive transition-centered algorithm specifically designed to resolve protein binding sites at high resolution in PAR-CLIP data. Our method employes a Bayesian network approach to associate posterior log-odds with the observed transitions, providing an overall quantification of the confidence in RNA-protein interaction. We use published PAR-CLIP data to demonstrate the advantages of our approach, which compares favorably with alternative algorithms. Lastly, by integrating RNA-Seq data we compute conservative experimentally-based false discovery rates of our method and demonstrate the high precision of our strategy.

Conclusions: Our method is implemented in the R package wavClusteR 2.0. The package is distributed under the GPL-2 license and is available from BioConductor at http://www.bioconductor.org/packages/devel/bioc/html/wavClusteR.html .

Show MeSH