Limits...
Employing machine learning for reliable miRNA target identification in plants.

Jha A, Shankar R - BMC Genomics (2011)

Bottom Line: Performance comparison for p-TAREF was done with other prediction tools for plants with utmost rigor and where p-TAREF was found better performing in several aspects.Using p-TAREF, target identification was done for the complete Rice transcriptome, supported by expression and degradome based data. miR156 was found as an important component of the Rice regulatory system, where control of genes associated with growth and transcription looked predominant.The performance was assessed and compared through comprehensive testing and benchmarking, suggesting a reliable performance and gross usability for transcriptome wide plant miRNA target identification.

View Article: PubMed Central - HTML - PubMed

Affiliation: Studio of Computational Biology & Bioinformatics, Biotechnology Division, Institute of Himalayan Bioresource Technology, Council of Scientific & Industrial Research, Palampur 176061 (HP), India.

ABSTRACT

Background: miRNAs are ~21 nucleotide long small noncoding RNA molecules, formed endogenously in most of the eukaryotes, which mainly control their target genes post transcriptionally by interacting and silencing them. While a lot of tools has been developed for animal miRNA target system, plant miRNA target identification system has witnessed limited development. Most of them have been centered around exact complementarity match. Very few of them considered other factors like multiple target sites and role of flanking regions.

Result: In the present work, a Support Vector Regression (SVR) approach has been implemented for plant miRNA target identification, utilizing position specific dinucleotide density variation information around the target sites, to yield highly reliable result. It has been named as p-TAREF (plant-Target Refiner). Performance comparison for p-TAREF was done with other prediction tools for plants with utmost rigor and where p-TAREF was found better performing in several aspects. Further, p-TAREF was run over the experimentally validated miRNA targets from species like Arabidopsis, Medicago, Rice and Tomato, and detected them accurately, suggesting gross usability of p-TAREF for plant species. Using p-TAREF, target identification was done for the complete Rice transcriptome, supported by expression and degradome based data. miR156 was found as an important component of the Rice regulatory system, where control of genes associated with growth and transcription looked predominant. The entire methodology has been implemented in a multi-threaded parallel architecture in Java, to enable fast processing for web-server version as well as standalone version. This also makes it to run even on a simple desktop computer in concurrent mode. It also provides a facility to gather experimental support for predictions made, through on the spot expression data analysis, in its web-server version.

Conclusion: A machine learning multivariate feature tool has been implemented in parallel and locally installable form, for plant miRNA target identification. The performance was assessed and compared through comprehensive testing and benchmarking, suggesting a reliable performance and gross usability for transcriptome wide plant miRNA target identification.

Show MeSH

Related in: MedlinePlus

The p-TAREF webserver. The web-server provides a friendly interface to load query sequences, with various parameter settings which include selection of energy cut-off, mismatch level allowed, SVR Kernel to be used, number of processors to be used, etc. Its performance tab contains detailing about all performance measures done for p-TAREF performance benchmarking and comparison with other tools.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3293931&req=5

Figure 2: The p-TAREF webserver. The web-server provides a friendly interface to load query sequences, with various parameter settings which include selection of energy cut-off, mismatch level allowed, SVR Kernel to be used, number of processors to be used, etc. Its performance tab contains detailing about all performance measures done for p-TAREF performance benchmarking and comparison with other tools.

Mentions: p-TAREF comes as a server as well as standalone version. The web-server takes single as well as batch mode submission of the query sequences. However, considering the connectivity dependence upon network, it is quite advisable to use the web-server version for single sequence or small number of sequences. The input of sequence requires FASTA manner entry where the first line starts with ">" followed by "AT" and accession ID or numeric digits to identify the sequence, without any gap, followed by next line having the sequence. Query could be either pasted directly or uploaded through some text file. The users are given with three choices 1) Type-I: Just submit the query sequence and run the tool from beginning, starting from RNAhybrid step. 2) Type-II: Submit the target mRNA sequence along-with predicted target sequence. 3) Type-III: Choose some miRNA from a drop down menu to identify targets on the submitted query sequences. Type-I facilitates the user to perform all tasks on the given query sequence, while Type-II is more for confirmation and validation of already predicted target by some other method, applying support vector regression module directly. Unlike Type-II, Type-I is more computationally intensive as it involves time consuming step of RNAhybrid, dynamic programming based alignment step, pattern encoding and search as well as large amount of parsing. Considering this, the option of concurrency has been given to the user for Type-I, where the user could choose the number of processors to be used to run the server concurrently and get results faster. Type-I also provides the user with options to select the allowed number of mismatches while estimating similarity between the predicted and experimentally validated encoded patterns for interactions between miRNA and targets. The maximum allowed level goes upto four mismatches. Higher the mismatch level cut-off, more number of total targets may emerge out. There is an option to set the threshold energy cut-off parameter for RNAhybrid run, which is -10 kcal/mol by default. A decisive step in parameter selection is the selection of plant model according to the Kernel (Choice of Kernel). Here, p-TAREF provides three options to choose from: Linear Kernel, Radial Basis Function (Gaussian) and Polynomial function. Linear function runs straight with least accommodative power, Gaussian is moderately stringent and Polynomial tries to cover more spread and deviating instances correctly. The Type-II option is more for validation purpose in case if a user wishes to confirm the predicted target by some other tool or method, by applying SVR approach. In this case, the user has to paste the predicted target region sequence as well as the sequence in which the target was predicted. Based upon the dinucleotide density profile variation method for refinement, SVR scores will be generated for the query. Figure 2 provides the look for Type-I form of the server. Besides this the server provides Type-II option to perform SVR validation for already predicted targets. It takes the predicted target sequence as well as the mRNA sequence, to which the target region belongs. Type-III option provides a list of miRNAs to opt from and perform analysis on the user submitted query sequence.


Employing machine learning for reliable miRNA target identification in plants.

Jha A, Shankar R - BMC Genomics (2011)

The p-TAREF webserver. The web-server provides a friendly interface to load query sequences, with various parameter settings which include selection of energy cut-off, mismatch level allowed, SVR Kernel to be used, number of processors to be used, etc. Its performance tab contains detailing about all performance measures done for p-TAREF performance benchmarking and comparison with other tools.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3293931&req=5

Figure 2: The p-TAREF webserver. The web-server provides a friendly interface to load query sequences, with various parameter settings which include selection of energy cut-off, mismatch level allowed, SVR Kernel to be used, number of processors to be used, etc. Its performance tab contains detailing about all performance measures done for p-TAREF performance benchmarking and comparison with other tools.
Mentions: p-TAREF comes as a server as well as standalone version. The web-server takes single as well as batch mode submission of the query sequences. However, considering the connectivity dependence upon network, it is quite advisable to use the web-server version for single sequence or small number of sequences. The input of sequence requires FASTA manner entry where the first line starts with ">" followed by "AT" and accession ID or numeric digits to identify the sequence, without any gap, followed by next line having the sequence. Query could be either pasted directly or uploaded through some text file. The users are given with three choices 1) Type-I: Just submit the query sequence and run the tool from beginning, starting from RNAhybrid step. 2) Type-II: Submit the target mRNA sequence along-with predicted target sequence. 3) Type-III: Choose some miRNA from a drop down menu to identify targets on the submitted query sequences. Type-I facilitates the user to perform all tasks on the given query sequence, while Type-II is more for confirmation and validation of already predicted target by some other method, applying support vector regression module directly. Unlike Type-II, Type-I is more computationally intensive as it involves time consuming step of RNAhybrid, dynamic programming based alignment step, pattern encoding and search as well as large amount of parsing. Considering this, the option of concurrency has been given to the user for Type-I, where the user could choose the number of processors to be used to run the server concurrently and get results faster. Type-I also provides the user with options to select the allowed number of mismatches while estimating similarity between the predicted and experimentally validated encoded patterns for interactions between miRNA and targets. The maximum allowed level goes upto four mismatches. Higher the mismatch level cut-off, more number of total targets may emerge out. There is an option to set the threshold energy cut-off parameter for RNAhybrid run, which is -10 kcal/mol by default. A decisive step in parameter selection is the selection of plant model according to the Kernel (Choice of Kernel). Here, p-TAREF provides three options to choose from: Linear Kernel, Radial Basis Function (Gaussian) and Polynomial function. Linear function runs straight with least accommodative power, Gaussian is moderately stringent and Polynomial tries to cover more spread and deviating instances correctly. The Type-II option is more for validation purpose in case if a user wishes to confirm the predicted target by some other tool or method, by applying SVR approach. In this case, the user has to paste the predicted target region sequence as well as the sequence in which the target was predicted. Based upon the dinucleotide density profile variation method for refinement, SVR scores will be generated for the query. Figure 2 provides the look for Type-I form of the server. Besides this the server provides Type-II option to perform SVR validation for already predicted targets. It takes the predicted target sequence as well as the mRNA sequence, to which the target region belongs. Type-III option provides a list of miRNAs to opt from and perform analysis on the user submitted query sequence.

Bottom Line: Performance comparison for p-TAREF was done with other prediction tools for plants with utmost rigor and where p-TAREF was found better performing in several aspects.Using p-TAREF, target identification was done for the complete Rice transcriptome, supported by expression and degradome based data. miR156 was found as an important component of the Rice regulatory system, where control of genes associated with growth and transcription looked predominant.The performance was assessed and compared through comprehensive testing and benchmarking, suggesting a reliable performance and gross usability for transcriptome wide plant miRNA target identification.

View Article: PubMed Central - HTML - PubMed

Affiliation: Studio of Computational Biology & Bioinformatics, Biotechnology Division, Institute of Himalayan Bioresource Technology, Council of Scientific & Industrial Research, Palampur 176061 (HP), India.

ABSTRACT

Background: miRNAs are ~21 nucleotide long small noncoding RNA molecules, formed endogenously in most of the eukaryotes, which mainly control their target genes post transcriptionally by interacting and silencing them. While a lot of tools has been developed for animal miRNA target system, plant miRNA target identification system has witnessed limited development. Most of them have been centered around exact complementarity match. Very few of them considered other factors like multiple target sites and role of flanking regions.

Result: In the present work, a Support Vector Regression (SVR) approach has been implemented for plant miRNA target identification, utilizing position specific dinucleotide density variation information around the target sites, to yield highly reliable result. It has been named as p-TAREF (plant-Target Refiner). Performance comparison for p-TAREF was done with other prediction tools for plants with utmost rigor and where p-TAREF was found better performing in several aspects. Further, p-TAREF was run over the experimentally validated miRNA targets from species like Arabidopsis, Medicago, Rice and Tomato, and detected them accurately, suggesting gross usability of p-TAREF for plant species. Using p-TAREF, target identification was done for the complete Rice transcriptome, supported by expression and degradome based data. miR156 was found as an important component of the Rice regulatory system, where control of genes associated with growth and transcription looked predominant. The entire methodology has been implemented in a multi-threaded parallel architecture in Java, to enable fast processing for web-server version as well as standalone version. This also makes it to run even on a simple desktop computer in concurrent mode. It also provides a facility to gather experimental support for predictions made, through on the spot expression data analysis, in its web-server version.

Conclusion: A machine learning multivariate feature tool has been implemented in parallel and locally installable form, for plant miRNA target identification. The performance was assessed and compared through comprehensive testing and benchmarking, suggesting a reliable performance and gross usability for transcriptome wide plant miRNA target identification.

Show MeSH
Related in: MedlinePlus