Limits...
Computational prediction of intronic microRNA targets using host gene expression reveals novel regulatory mechanisms.

Radfar MH, Wong W, Morris Q - PLoS ONE (2011)

Bottom Line: Host genes that InMiR predicts are bad surrogates contain significantly more miRNA target sites in their 3' UTRs and are significantly more likely to have predicted Pol II and Pol III promoters in their introns.We provide a dataset of 1,935 predicted mRNA targets for 22 intronic miRNAs.These prediction are supported both by sequence features and expression.By combining our results with previous reports, we distinguish three classes of intronic miRNAs: Those that are tightly regulated with their host gene; those that are likely to be expressed from the same promoter but whose host gene is highly regulated by miRNAs; and those likely to have independent promoters.

View Article: PubMed Central - PubMed

Affiliation: Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, Canada. h.radfar@utoronto.ca

ABSTRACT
Approximately half of known human miRNAs are located in the introns of protein coding genes. Some of these intronic miRNAs are only expressed when their host gene is and, as such, their steady state expression levels are highly correlated with those of the host gene's mRNA. Recently host gene expression levels have been used to predict the targets of intronic miRNAs by identifying other mRNAs that they have consistent negative correlation with. This is a potentially powerful approach because it allows a large number of expression profiling studies to be used but needs refinement because mRNAs can be targeted by multiple miRNAs and not all intronic miRNAs are co-expressed with their host genes.Here we introduce InMiR, a new computational method that uses a linear-Gaussian model to predict the targets of intronic miRNAs based on the expression profiles of their host genes across a large number of datasets. Our method recovers nearly twice as many true positives at the same fixed false positive rate as a comparable method that only considers correlations. Through an analysis of 140 Affymetrix datasets from Gene Expression Omnibus, we build a network of 19,926 interactions among 57 intronic miRNAs and 3,864 targets. InMiR can also predict which host genes have expression profiles that are good surrogates for those of their intronic miRNAs. Host genes that InMiR predicts are bad surrogates contain significantly more miRNA target sites in their 3' UTRs and are significantly more likely to have predicted Pol II and Pol III promoters in their introns.We provide a dataset of 1,935 predicted mRNA targets for 22 intronic miRNAs. These prediction are supported both by sequence features and expression. By combining our results with previous reports, we distinguish three classes of intronic miRNAs: Those that are tightly regulated with their host gene; those that are likely to be expressed from the same promoter but whose host gene is highly regulated by miRNAs; and those likely to have independent promoters.

Show MeSH
CDF plots for weights.Plots a–d: the CDFs of the weights  (a–b) and  (c and d) for seven host genes obtained from ULM (a and b), and CORR (c and d) with the actual (a and c) and permutation setups (b and d). The thick gray line in each plot is the CDF obtained from the pooled permutation data for each method. The Table lists the p-values (Willcoxon ranksum test) showing the probability that the weight or correlation data are drawn from the pooled permutated data (see (4) and (5) for detail). P-values marked in red are predicted to be significant (). It should be noted that the host gene MIRHG1 was excluded for analysis since the expression data related this host gene did not exist in the retrieved dataset.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3111417&req=5

pone-0019312-g003: CDF plots for weights.Plots a–d: the CDFs of the weights (a–b) and (c and d) for seven host genes obtained from ULM (a and b), and CORR (c and d) with the actual (a and c) and permutation setups (b and d). The thick gray line in each plot is the CDF obtained from the pooled permutation data for each method. The Table lists the p-values (Willcoxon ranksum test) showing the probability that the weight or correlation data are drawn from the pooled permutated data (see (4) and (5) for detail). P-values marked in red are predicted to be significant (). It should be noted that the host gene MIRHG1 was excluded for analysis since the expression data related this host gene did not exist in the retrieved dataset.

Mentions: Fig. 3.a–d show the CDFs of weights (i.e. and ,) for all host genes whose intronic miRNAs have potential target sites in LSM12. The CDF of the pooled weights obtained from the permuted data (the thick gray line) is also shown. These weights were obtained from two methods: ULM (Fig. 3.a–b) and a method that sets weights by correlation (Fig. 3.c–d) (the CORR method, see materials for details). Recently, the HOCTAR method was introduced that uses inverse correlation with host genes to detect intronic miRNA targets [16]; here we use the CORR method to demonstrate how well inverse correlation performed within our framework. From Fig. 3.c–d, we see that the distributions obtained from CORR from the actual and permuted data are almost indistinguishable suggesting that CORR is unpowered and/or prone to misclassification compared to ULM. Moreover, these observations also confirm the cooperative impact of miRNAs on target genes. By contrast, the distributions of three host genes, namely CTDSP1,CTDSP2, and CTDSPL, obtained from ULM–also from constrained linear model (CLM) (Fig.S4)–are significantly different from their permuted counterparts and the pooled distribution. The table at the bottom of Fig. 3 lists and for each interaction. In the next subsection we specify a cutoff point in order to determine the significant interactions that we will be using to make predictions about targets.


Computational prediction of intronic microRNA targets using host gene expression reveals novel regulatory mechanisms.

Radfar MH, Wong W, Morris Q - PLoS ONE (2011)

CDF plots for weights.Plots a–d: the CDFs of the weights  (a–b) and  (c and d) for seven host genes obtained from ULM (a and b), and CORR (c and d) with the actual (a and c) and permutation setups (b and d). The thick gray line in each plot is the CDF obtained from the pooled permutation data for each method. The Table lists the p-values (Willcoxon ranksum test) showing the probability that the weight or correlation data are drawn from the pooled permutated data (see (4) and (5) for detail). P-values marked in red are predicted to be significant (). It should be noted that the host gene MIRHG1 was excluded for analysis since the expression data related this host gene did not exist in the retrieved dataset.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3111417&req=5

pone-0019312-g003: CDF plots for weights.Plots a–d: the CDFs of the weights (a–b) and (c and d) for seven host genes obtained from ULM (a and b), and CORR (c and d) with the actual (a and c) and permutation setups (b and d). The thick gray line in each plot is the CDF obtained from the pooled permutation data for each method. The Table lists the p-values (Willcoxon ranksum test) showing the probability that the weight or correlation data are drawn from the pooled permutated data (see (4) and (5) for detail). P-values marked in red are predicted to be significant (). It should be noted that the host gene MIRHG1 was excluded for analysis since the expression data related this host gene did not exist in the retrieved dataset.
Mentions: Fig. 3.a–d show the CDFs of weights (i.e. and ,) for all host genes whose intronic miRNAs have potential target sites in LSM12. The CDF of the pooled weights obtained from the permuted data (the thick gray line) is also shown. These weights were obtained from two methods: ULM (Fig. 3.a–b) and a method that sets weights by correlation (Fig. 3.c–d) (the CORR method, see materials for details). Recently, the HOCTAR method was introduced that uses inverse correlation with host genes to detect intronic miRNA targets [16]; here we use the CORR method to demonstrate how well inverse correlation performed within our framework. From Fig. 3.c–d, we see that the distributions obtained from CORR from the actual and permuted data are almost indistinguishable suggesting that CORR is unpowered and/or prone to misclassification compared to ULM. Moreover, these observations also confirm the cooperative impact of miRNAs on target genes. By contrast, the distributions of three host genes, namely CTDSP1,CTDSP2, and CTDSPL, obtained from ULM–also from constrained linear model (CLM) (Fig.S4)–are significantly different from their permuted counterparts and the pooled distribution. The table at the bottom of Fig. 3 lists and for each interaction. In the next subsection we specify a cutoff point in order to determine the significant interactions that we will be using to make predictions about targets.

Bottom Line: Host genes that InMiR predicts are bad surrogates contain significantly more miRNA target sites in their 3' UTRs and are significantly more likely to have predicted Pol II and Pol III promoters in their introns.We provide a dataset of 1,935 predicted mRNA targets for 22 intronic miRNAs.These prediction are supported both by sequence features and expression.By combining our results with previous reports, we distinguish three classes of intronic miRNAs: Those that are tightly regulated with their host gene; those that are likely to be expressed from the same promoter but whose host gene is highly regulated by miRNAs; and those likely to have independent promoters.

View Article: PubMed Central - PubMed

Affiliation: Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, Canada. h.radfar@utoronto.ca

ABSTRACT
Approximately half of known human miRNAs are located in the introns of protein coding genes. Some of these intronic miRNAs are only expressed when their host gene is and, as such, their steady state expression levels are highly correlated with those of the host gene's mRNA. Recently host gene expression levels have been used to predict the targets of intronic miRNAs by identifying other mRNAs that they have consistent negative correlation with. This is a potentially powerful approach because it allows a large number of expression profiling studies to be used but needs refinement because mRNAs can be targeted by multiple miRNAs and not all intronic miRNAs are co-expressed with their host genes.Here we introduce InMiR, a new computational method that uses a linear-Gaussian model to predict the targets of intronic miRNAs based on the expression profiles of their host genes across a large number of datasets. Our method recovers nearly twice as many true positives at the same fixed false positive rate as a comparable method that only considers correlations. Through an analysis of 140 Affymetrix datasets from Gene Expression Omnibus, we build a network of 19,926 interactions among 57 intronic miRNAs and 3,864 targets. InMiR can also predict which host genes have expression profiles that are good surrogates for those of their intronic miRNAs. Host genes that InMiR predicts are bad surrogates contain significantly more miRNA target sites in their 3' UTRs and are significantly more likely to have predicted Pol II and Pol III promoters in their introns.We provide a dataset of 1,935 predicted mRNA targets for 22 intronic miRNAs. These prediction are supported both by sequence features and expression. By combining our results with previous reports, we distinguish three classes of intronic miRNAs: Those that are tightly regulated with their host gene; those that are likely to be expressed from the same promoter but whose host gene is highly regulated by miRNAs; and those likely to have independent promoters.

Show MeSH