Limits...
A bioinformatics knowledge discovery in text application for grid computing.

Castellano M, Mastronardi G, Bellotti R, Tarricone G - BMC Bioinformatics (2009)

Bottom Line: It included a graphical user interface in order to access to a node search system, a load balancing system and a transfer optimizer to reduce communication costs.It was written in JAVA on Globus Toolkit 4 to build the grid infrastructure based on GNU/Linux computer grid nodes.As an example a computation of Knowledge Discovery in Database was applied on the output produced by the KDT user module to extract new knowledge about symptom and pathology bio-entities.

View Article: PubMed Central - HTML - PubMed

Affiliation: DEE Dipartimento di Elettrotecnica ed Elettronica, Politecnico di Bari, via Orabona, 4, 70125, Bari, Italy. castellano@poliba.it

ABSTRACT

Background: A fundamental activity in biomedical research is Knowledge Discovery which has the ability to search through large amounts of biomedical information such as documents and data. High performance computational infrastructures, such as Grid technologies, are emerging as a possible infrastructure to tackle the intensive use of Information and Communication resources in life science. The goal of this work was to develop a software middleware solution in order to exploit the many knowledge discovery applications on scalable and distributed computing systems to achieve intensive use of ICT resources.

Methods: The development of a grid application for Knowledge Discovery in Text using a middleware solution based methodology is presented. The system must be able to: perform a user application model, process the jobs with the aim of creating many parallel jobs to distribute on the computational nodes. Finally, the system must be aware of the computational resources available, their status and must be able to monitor the execution of parallel jobs. These operative requirements lead to design a middleware to be specialized using user application modules. It included a graphical user interface in order to access to a node search system, a load balancing system and a transfer optimizer to reduce communication costs.

Results: A middleware solution prototype and the performance evaluation of it in terms of the speed-up factor is shown. It was written in JAVA on Globus Toolkit 4 to build the grid infrastructure based on GNU/Linux computer grid nodes. A test was carried out and the results are shown for the named entity recognition search of symptoms and pathologies. The search was applied to a collection of 5,000 scientific documents taken from PubMed.

Conclusion: In this paper we discuss the development of a grid application based on a middleware solution. It has been tested on a knowledge discovery in text process to extract new and useful information about symptoms and pathologies from a large collection of unstructured scientific documents. As an example a computation of Knowledge Discovery in Database was applied on the output produced by the KDT user module to extract new knowledge about symptom and pathology bio-entities.

Show MeSH

Related in: MedlinePlus

Bioinformatics architecture. This figure shows a bioinformatics knowledge discovery application architecture. It presents the integrated development environment, GATE, which was used for the text mining process. GATE operated on a collection of scientific publications The process of Text Mining starts from a set of scientific publications in full text available on MedLine/Pubmed (in pdf format). Moreover, the figure shows the Layer Architecture consisting of GATE 4.0 Toolkit for Text Mining, our Middleware solution written by Java API, the grid infrastructure middleware, and, finally, a physical layer that consists of a Gnu/Linux Operating System.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2697647&req=5

Figure 4: Bioinformatics architecture. This figure shows a bioinformatics knowledge discovery application architecture. It presents the integrated development environment, GATE, which was used for the text mining process. GATE operated on a collection of scientific publications The process of Text Mining starts from a set of scientific publications in full text available on MedLine/Pubmed (in pdf format). Moreover, the figure shows the Layer Architecture consisting of GATE 4.0 Toolkit for Text Mining, our Middleware solution written by Java API, the grid infrastructure middleware, and, finally, a physical layer that consists of a Gnu/Linux Operating System.

Mentions: The results of the feasibility study concern both the technological choices adopted for the construction of the prototype and the demonstration of these through the use of bioinformatics applications. The latter is in relation to the study of the knowledge discovery of bio-entities which included symptoms and pathologies contained in a collection of 5,000 documents. Figure 4 shows the system prototype's architecture with reference to the technological choices adopted. In particular, the software platform GATE was utilized for the knowledge discovery in text. The infrastructure of the resources of the calculations based on the computational grid was created with Globus toolkit. Finally, the middleware solution system referred to a code developed in Java Language through the Java Virtual Machine. Moreover, it expressed the actual grid requests with Linux Shell Script components and implements its internal services related to the job management functions through calls to the services available from GRAM, GridFTP and Condor System. GRAM enables the remote execution management where there is reliable operation, statefull monitoring, credential management and file staging. GridFTP provides high-performance, secure technologies for reliable data transfer, while the Condor System is a specialized workload management system for compute-intensive jobs. It provides a job queueing mechanism and a scheduling policy. Moreover, the middleware solution is able to manage different User SIMD Application Modules. These modules give a codified description of the program, which is to be performed, through operational requests of knowledge discovery made available by the environment that is being used. The UAMs are built with the use of a template. Figure 5 shows the GUI of a prototype accessable to a user for communicating the job to the system


A bioinformatics knowledge discovery in text application for grid computing.

Castellano M, Mastronardi G, Bellotti R, Tarricone G - BMC Bioinformatics (2009)

Bioinformatics architecture. This figure shows a bioinformatics knowledge discovery application architecture. It presents the integrated development environment, GATE, which was used for the text mining process. GATE operated on a collection of scientific publications The process of Text Mining starts from a set of scientific publications in full text available on MedLine/Pubmed (in pdf format). Moreover, the figure shows the Layer Architecture consisting of GATE 4.0 Toolkit for Text Mining, our Middleware solution written by Java API, the grid infrastructure middleware, and, finally, a physical layer that consists of a Gnu/Linux Operating System.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2697647&req=5

Figure 4: Bioinformatics architecture. This figure shows a bioinformatics knowledge discovery application architecture. It presents the integrated development environment, GATE, which was used for the text mining process. GATE operated on a collection of scientific publications The process of Text Mining starts from a set of scientific publications in full text available on MedLine/Pubmed (in pdf format). Moreover, the figure shows the Layer Architecture consisting of GATE 4.0 Toolkit for Text Mining, our Middleware solution written by Java API, the grid infrastructure middleware, and, finally, a physical layer that consists of a Gnu/Linux Operating System.
Mentions: The results of the feasibility study concern both the technological choices adopted for the construction of the prototype and the demonstration of these through the use of bioinformatics applications. The latter is in relation to the study of the knowledge discovery of bio-entities which included symptoms and pathologies contained in a collection of 5,000 documents. Figure 4 shows the system prototype's architecture with reference to the technological choices adopted. In particular, the software platform GATE was utilized for the knowledge discovery in text. The infrastructure of the resources of the calculations based on the computational grid was created with Globus toolkit. Finally, the middleware solution system referred to a code developed in Java Language through the Java Virtual Machine. Moreover, it expressed the actual grid requests with Linux Shell Script components and implements its internal services related to the job management functions through calls to the services available from GRAM, GridFTP and Condor System. GRAM enables the remote execution management where there is reliable operation, statefull monitoring, credential management and file staging. GridFTP provides high-performance, secure technologies for reliable data transfer, while the Condor System is a specialized workload management system for compute-intensive jobs. It provides a job queueing mechanism and a scheduling policy. Moreover, the middleware solution is able to manage different User SIMD Application Modules. These modules give a codified description of the program, which is to be performed, through operational requests of knowledge discovery made available by the environment that is being used. The UAMs are built with the use of a template. Figure 5 shows the GUI of a prototype accessable to a user for communicating the job to the system

Bottom Line: It included a graphical user interface in order to access to a node search system, a load balancing system and a transfer optimizer to reduce communication costs.It was written in JAVA on Globus Toolkit 4 to build the grid infrastructure based on GNU/Linux computer grid nodes.As an example a computation of Knowledge Discovery in Database was applied on the output produced by the KDT user module to extract new knowledge about symptom and pathology bio-entities.

View Article: PubMed Central - HTML - PubMed

Affiliation: DEE Dipartimento di Elettrotecnica ed Elettronica, Politecnico di Bari, via Orabona, 4, 70125, Bari, Italy. castellano@poliba.it

ABSTRACT

Background: A fundamental activity in biomedical research is Knowledge Discovery which has the ability to search through large amounts of biomedical information such as documents and data. High performance computational infrastructures, such as Grid technologies, are emerging as a possible infrastructure to tackle the intensive use of Information and Communication resources in life science. The goal of this work was to develop a software middleware solution in order to exploit the many knowledge discovery applications on scalable and distributed computing systems to achieve intensive use of ICT resources.

Methods: The development of a grid application for Knowledge Discovery in Text using a middleware solution based methodology is presented. The system must be able to: perform a user application model, process the jobs with the aim of creating many parallel jobs to distribute on the computational nodes. Finally, the system must be aware of the computational resources available, their status and must be able to monitor the execution of parallel jobs. These operative requirements lead to design a middleware to be specialized using user application modules. It included a graphical user interface in order to access to a node search system, a load balancing system and a transfer optimizer to reduce communication costs.

Results: A middleware solution prototype and the performance evaluation of it in terms of the speed-up factor is shown. It was written in JAVA on Globus Toolkit 4 to build the grid infrastructure based on GNU/Linux computer grid nodes. A test was carried out and the results are shown for the named entity recognition search of symptoms and pathologies. The search was applied to a collection of 5,000 scientific documents taken from PubMed.

Conclusion: In this paper we discuss the development of a grid application based on a middleware solution. It has been tested on a knowledge discovery in text process to extract new and useful information about symptoms and pathologies from a large collection of unstructured scientific documents. As an example a computation of Knowledge Discovery in Database was applied on the output produced by the KDT user module to extract new knowledge about symptom and pathology bio-entities.

Show MeSH
Related in: MedlinePlus