Limits...
Feature selection environment for genomic applications.

Lopes FM, Martins DC, Cesar RM - BMC Bioinformatics (2008)

Bottom Line: Feature selection is a pattern recognition approach to choose important variables according to some criteria in order to distinguish or explain certain phenomena (i.e., for dimensionality reduction).Our experiments have shown the software effectiveness in two distinct types of biological problems.Besides, the environment can be used in different pattern recognition applications, although the main concern regards bioinformatics tasks.

View Article: PubMed Central - HTML - PubMed

Affiliation: Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo, SP, Brazil. fabricio@utfpr.edu.br

ABSTRACT

Background: Feature selection is a pattern recognition approach to choose important variables according to some criteria in order to distinguish or explain certain phenomena (i.e., for dimensionality reduction). There are many genomic and proteomic applications that rely on feature selection to answer questions such as selecting signature genes which are informative about some biological state, e.g., normal tissues and several types of cancer; or inferring a prediction network among elements such as genes, proteins and external stimuli. In these applications, a recurrent problem is the lack of samples to perform an adequate estimate of the joint probabilities between element states. A myriad of feature selection algorithms and criterion functions have been proposed, although it is difficult to point the best solution for each application.

Results: The intent of this work is to provide an open-source multiplatform graphical environment for bioinformatics problems, which supports many feature selection algorithms, criterion functions and graphic visualization tools such as scatterplots, parallel coordinates and graphs. A feature selection approach for growing genetic networks from seed genes (targets or predictors) is also implemented in the system.

Conclusion: The proposed feature selection environment allows data analysis using several algorithms, criterion functions and graphic visualization tools. Our experiments have shown the software effectiveness in two distinct types of biological problems. Besides, the environment can be used in different pattern recognition applications, although the main concern regards bioinformatics tasks.

Show MeSH

Related in: MedlinePlus

Single feature selection execution. The software panel to apply single feature selection execution and feature selection approach for growing of genetic networks from target genes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2655091&req=5

Figure 4: Single feature selection execution. The software panel to apply single feature selection execution and feature selection approach for growing of genetic networks from target genes.

Mentions: The software is implemented in Java in order to be executable in different platforms. It is open source and intended to be continuously developed in a world-wide collaboration. There are four main panels: the first one allows the user to load the data set (Figure 2). The second is optional for the user to define a quantization degree to the data set. The quantized data may be visualized (Figure 3). It is worth noting that the mean conditional entropy and the CoD require data quantization to discrete values. This fact explains the quantization step available in the software. Please refer to the Section Data Preprocessing for further information on the normalization and quantization methods applied. The reason to keep the quantization optional is because other criterion functions, such as the distance-based criteria, do not require quantization and the software is intended to implement as many criterion functions as possible. The third panel is dedicated to perform single tests (Figure 4). With this panel, the user can execute feature selection methods (see Section Implemented feature selection algorithms) along with the criterion function (see Section Implemented criterion functions). The user has two options: the first one is to apply the simple resubstitution method (i.e., to use the same input data as training and test set); the second option consists in to apply the classifier on a different data set given by the user specified at input test set text box (Holdout cross-validation with training and test sets partitioning defined by the user).


Feature selection environment for genomic applications.

Lopes FM, Martins DC, Cesar RM - BMC Bioinformatics (2008)

Single feature selection execution. The software panel to apply single feature selection execution and feature selection approach for growing of genetic networks from target genes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2655091&req=5

Figure 4: Single feature selection execution. The software panel to apply single feature selection execution and feature selection approach for growing of genetic networks from target genes.
Mentions: The software is implemented in Java in order to be executable in different platforms. It is open source and intended to be continuously developed in a world-wide collaboration. There are four main panels: the first one allows the user to load the data set (Figure 2). The second is optional for the user to define a quantization degree to the data set. The quantized data may be visualized (Figure 3). It is worth noting that the mean conditional entropy and the CoD require data quantization to discrete values. This fact explains the quantization step available in the software. Please refer to the Section Data Preprocessing for further information on the normalization and quantization methods applied. The reason to keep the quantization optional is because other criterion functions, such as the distance-based criteria, do not require quantization and the software is intended to implement as many criterion functions as possible. The third panel is dedicated to perform single tests (Figure 4). With this panel, the user can execute feature selection methods (see Section Implemented feature selection algorithms) along with the criterion function (see Section Implemented criterion functions). The user has two options: the first one is to apply the simple resubstitution method (i.e., to use the same input data as training and test set); the second option consists in to apply the classifier on a different data set given by the user specified at input test set text box (Holdout cross-validation with training and test sets partitioning defined by the user).

Bottom Line: Feature selection is a pattern recognition approach to choose important variables according to some criteria in order to distinguish or explain certain phenomena (i.e., for dimensionality reduction).Our experiments have shown the software effectiveness in two distinct types of biological problems.Besides, the environment can be used in different pattern recognition applications, although the main concern regards bioinformatics tasks.

View Article: PubMed Central - HTML - PubMed

Affiliation: Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo, SP, Brazil. fabricio@utfpr.edu.br

ABSTRACT

Background: Feature selection is a pattern recognition approach to choose important variables according to some criteria in order to distinguish or explain certain phenomena (i.e., for dimensionality reduction). There are many genomic and proteomic applications that rely on feature selection to answer questions such as selecting signature genes which are informative about some biological state, e.g., normal tissues and several types of cancer; or inferring a prediction network among elements such as genes, proteins and external stimuli. In these applications, a recurrent problem is the lack of samples to perform an adequate estimate of the joint probabilities between element states. A myriad of feature selection algorithms and criterion functions have been proposed, although it is difficult to point the best solution for each application.

Results: The intent of this work is to provide an open-source multiplatform graphical environment for bioinformatics problems, which supports many feature selection algorithms, criterion functions and graphic visualization tools such as scatterplots, parallel coordinates and graphs. A feature selection approach for growing genetic networks from seed genes (targets or predictors) is also implemented in the system.

Conclusion: The proposed feature selection environment allows data analysis using several algorithms, criterion functions and graphic visualization tools. Our experiments have shown the software effectiveness in two distinct types of biological problems. Besides, the environment can be used in different pattern recognition applications, although the main concern regards bioinformatics tasks.

Show MeSH
Related in: MedlinePlus