Limits...
Employing machine learning for reliable miRNA target identification in plants.

Jha A, Shankar R - BMC Genomics (2011)

Bottom Line: Performance comparison for p-TAREF was done with other prediction tools for plants with utmost rigor and where p-TAREF was found better performing in several aspects.Using p-TAREF, target identification was done for the complete Rice transcriptome, supported by expression and degradome based data. miR156 was found as an important component of the Rice regulatory system, where control of genes associated with growth and transcription looked predominant.The performance was assessed and compared through comprehensive testing and benchmarking, suggesting a reliable performance and gross usability for transcriptome wide plant miRNA target identification.

View Article: PubMed Central - HTML - PubMed

Affiliation: Studio of Computational Biology & Bioinformatics, Biotechnology Division, Institute of Himalayan Bioresource Technology, Council of Scientific & Industrial Research, Palampur 176061 (HP), India.

ABSTRACT

Background: miRNAs are ~21 nucleotide long small noncoding RNA molecules, formed endogenously in most of the eukaryotes, which mainly control their target genes post transcriptionally by interacting and silencing them. While a lot of tools has been developed for animal miRNA target system, plant miRNA target identification system has witnessed limited development. Most of them have been centered around exact complementarity match. Very few of them considered other factors like multiple target sites and role of flanking regions.

Result: In the present work, a Support Vector Regression (SVR) approach has been implemented for plant miRNA target identification, utilizing position specific dinucleotide density variation information around the target sites, to yield highly reliable result. It has been named as p-TAREF (plant-Target Refiner). Performance comparison for p-TAREF was done with other prediction tools for plants with utmost rigor and where p-TAREF was found better performing in several aspects. Further, p-TAREF was run over the experimentally validated miRNA targets from species like Arabidopsis, Medicago, Rice and Tomato, and detected them accurately, suggesting gross usability of p-TAREF for plant species. Using p-TAREF, target identification was done for the complete Rice transcriptome, supported by expression and degradome based data. miR156 was found as an important component of the Rice regulatory system, where control of genes associated with growth and transcription looked predominant. The entire methodology has been implemented in a multi-threaded parallel architecture in Java, to enable fast processing for web-server version as well as standalone version. This also makes it to run even on a simple desktop computer in concurrent mode. It also provides a facility to gather experimental support for predictions made, through on the spot expression data analysis, in its web-server version.

Conclusion: A machine learning multivariate feature tool has been implemented in parallel and locally installable form, for plant miRNA target identification. The performance was assessed and compared through comprehensive testing and benchmarking, suggesting a reliable performance and gross usability for transcriptome wide plant miRNA target identification.

Show MeSH

Related in: MedlinePlus

Impact of concurrency over execution speed. p-TAREF was run over a set of genes for target identification, with different number of processors added through concurrency. As can be found, concurrency caused drastic reduction in processing time, which is highly beneficial in performing accurate transcriptome wide analysis.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3293931&req=5

Figure 4: Impact of concurrency over execution speed. p-TAREF was run over a set of genes for target identification, with different number of processors added through concurrency. As can be found, concurrency caused drastic reduction in processing time, which is highly beneficial in performing accurate transcriptome wide analysis.

Mentions: As already discussed in the introduction section, unlike the animal system based miRNA target identification tools, plant miRNA target identification tools have witnessed limited growth till recently. Many of them revolved around complementarity search, using either heuristics like BLAST and FASTA or Smith-Waterman in their core. Most of them are web-server based and barring psRNAtarget, none of them provides the scope of concurrency to enable analysis of large amount of sequence data. Considering the revolutions made by next generation sequencing and systems biology approach, it becomes imperative to analyze transcriptome/genome level data at one go, with high accuracy as well as speed. BLAST and FASTA dependent methods do not require concurrency due to innate advantage of FASTA and BLAST to be much faster, though at the cost of accuracy and reliability. For that, some authors tried Smith-Waterman local alignment to detect complementarity, which becomes sharply slower with increment in the number and length of sequences to be searched and more so if all to all search has to be performed without the prior knowledge of the miRNA. We compared one such tool, Target-align, with p-TAREF, for time performance as among the very few tools available as the standalone version, Target-align is a recently published software with widespread use. We executed Target-align and p-TAREF on 205 plant genes from Arabidopsis and recorded the time taken to finish the job. Though p-TAREF run could be accelerated through concurrency and use of more processors, no such facility was available with Target-align, making us to run it with single processor and compare the performance for time taken. Table 1 summarizes the time performance and impact of introduction of concurrency. Figure 4 displays the plot showing reduction in execution period on introduction of concurrency when run over 790 mRNA sequences associated with plant secondary metabolite pathway. The processing speed of p-TAREF shot up with inclusion of more processors, making it a better choice to look for whole transcriptome wide scanning. Besides p-TAREF, only psRNAtarget provides the advantage of concurrency. However, comparison between them for execution speed was not possible as psRNAtarget is available only as a web-server and its concurrency has been implemented through cluster computers having several processors and large volumes of memory. Unlike psRNAtarget, p-TAREF is easily deployable on any level of machines and can run concurrently even on simple desktop machine.


Employing machine learning for reliable miRNA target identification in plants.

Jha A, Shankar R - BMC Genomics (2011)

Impact of concurrency over execution speed. p-TAREF was run over a set of genes for target identification, with different number of processors added through concurrency. As can be found, concurrency caused drastic reduction in processing time, which is highly beneficial in performing accurate transcriptome wide analysis.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3293931&req=5

Figure 4: Impact of concurrency over execution speed. p-TAREF was run over a set of genes for target identification, with different number of processors added through concurrency. As can be found, concurrency caused drastic reduction in processing time, which is highly beneficial in performing accurate transcriptome wide analysis.
Mentions: As already discussed in the introduction section, unlike the animal system based miRNA target identification tools, plant miRNA target identification tools have witnessed limited growth till recently. Many of them revolved around complementarity search, using either heuristics like BLAST and FASTA or Smith-Waterman in their core. Most of them are web-server based and barring psRNAtarget, none of them provides the scope of concurrency to enable analysis of large amount of sequence data. Considering the revolutions made by next generation sequencing and systems biology approach, it becomes imperative to analyze transcriptome/genome level data at one go, with high accuracy as well as speed. BLAST and FASTA dependent methods do not require concurrency due to innate advantage of FASTA and BLAST to be much faster, though at the cost of accuracy and reliability. For that, some authors tried Smith-Waterman local alignment to detect complementarity, which becomes sharply slower with increment in the number and length of sequences to be searched and more so if all to all search has to be performed without the prior knowledge of the miRNA. We compared one such tool, Target-align, with p-TAREF, for time performance as among the very few tools available as the standalone version, Target-align is a recently published software with widespread use. We executed Target-align and p-TAREF on 205 plant genes from Arabidopsis and recorded the time taken to finish the job. Though p-TAREF run could be accelerated through concurrency and use of more processors, no such facility was available with Target-align, making us to run it with single processor and compare the performance for time taken. Table 1 summarizes the time performance and impact of introduction of concurrency. Figure 4 displays the plot showing reduction in execution period on introduction of concurrency when run over 790 mRNA sequences associated with plant secondary metabolite pathway. The processing speed of p-TAREF shot up with inclusion of more processors, making it a better choice to look for whole transcriptome wide scanning. Besides p-TAREF, only psRNAtarget provides the advantage of concurrency. However, comparison between them for execution speed was not possible as psRNAtarget is available only as a web-server and its concurrency has been implemented through cluster computers having several processors and large volumes of memory. Unlike psRNAtarget, p-TAREF is easily deployable on any level of machines and can run concurrently even on simple desktop machine.

Bottom Line: Performance comparison for p-TAREF was done with other prediction tools for plants with utmost rigor and where p-TAREF was found better performing in several aspects.Using p-TAREF, target identification was done for the complete Rice transcriptome, supported by expression and degradome based data. miR156 was found as an important component of the Rice regulatory system, where control of genes associated with growth and transcription looked predominant.The performance was assessed and compared through comprehensive testing and benchmarking, suggesting a reliable performance and gross usability for transcriptome wide plant miRNA target identification.

View Article: PubMed Central - HTML - PubMed

Affiliation: Studio of Computational Biology & Bioinformatics, Biotechnology Division, Institute of Himalayan Bioresource Technology, Council of Scientific & Industrial Research, Palampur 176061 (HP), India.

ABSTRACT

Background: miRNAs are ~21 nucleotide long small noncoding RNA molecules, formed endogenously in most of the eukaryotes, which mainly control their target genes post transcriptionally by interacting and silencing them. While a lot of tools has been developed for animal miRNA target system, plant miRNA target identification system has witnessed limited development. Most of them have been centered around exact complementarity match. Very few of them considered other factors like multiple target sites and role of flanking regions.

Result: In the present work, a Support Vector Regression (SVR) approach has been implemented for plant miRNA target identification, utilizing position specific dinucleotide density variation information around the target sites, to yield highly reliable result. It has been named as p-TAREF (plant-Target Refiner). Performance comparison for p-TAREF was done with other prediction tools for plants with utmost rigor and where p-TAREF was found better performing in several aspects. Further, p-TAREF was run over the experimentally validated miRNA targets from species like Arabidopsis, Medicago, Rice and Tomato, and detected them accurately, suggesting gross usability of p-TAREF for plant species. Using p-TAREF, target identification was done for the complete Rice transcriptome, supported by expression and degradome based data. miR156 was found as an important component of the Rice regulatory system, where control of genes associated with growth and transcription looked predominant. The entire methodology has been implemented in a multi-threaded parallel architecture in Java, to enable fast processing for web-server version as well as standalone version. This also makes it to run even on a simple desktop computer in concurrent mode. It also provides a facility to gather experimental support for predictions made, through on the spot expression data analysis, in its web-server version.

Conclusion: A machine learning multivariate feature tool has been implemented in parallel and locally installable form, for plant miRNA target identification. The performance was assessed and compared through comprehensive testing and benchmarking, suggesting a reliable performance and gross usability for transcriptome wide plant miRNA target identification.

Show MeSH
Related in: MedlinePlus