Limits...
Fizzy: feature subset selection for metagenomics.

Ditzler G, Morrison JC, Lan Y, Rosen GL - BMC Bioinformatics (2015)

Bottom Line: For example, in a previous study we have used information-theoretic feature selection to understand the differences between protein family abundances that best discriminate between age groups in the human gut microbiome.We demonstrate the software tools capabilities on publicly available datasets.We have made the software implementation of Fizzy available to the public under the GNU GPL license.

View Article: PubMed Central - PubMed

Affiliation: Department of Electrical & Computer Engineering, The University of Arizona, 1230 E Speedway Blvd., ECE Bldg., Tucson, 85721, AZ, USA. gregory.ditzler@gmail.com.

ABSTRACT

Background: Some of the current software tools for comparative metagenomics provide ecologists with the ability to investigate and explore bacterial communities using α- & β-diversity. Feature subset selection--a sub-field of machine learning--can also provide a unique insight into the differences between metagenomic or 16S phenotypes. In particular, feature subset selection methods can obtain the operational taxonomic units (OTUs), or functional features, that have a high-level of influence on the condition being studied. For example, in a previous study we have used information-theoretic feature selection to understand the differences between protein family abundances that best discriminate between age groups in the human gut microbiome.

Results: We have developed a new Python command line tool, which is compatible with the widely adopted BIOM format, for microbial ecologists that implements information-theoretic subset selection methods for biological data formats. We demonstrate the software tools capabilities on publicly available datasets.

Conclusions: We have made the software implementation of Fizzy available to the public under the GNU GPL license. The standalone implementation can be found at http://github.com/EESI/Fizzy.

No MeSH data available.


Related in: MedlinePlus

Pseudo code for search selecting features using a greedy algorithm that attempts to maximize
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4634798&req=5

Fig1: Pseudo code for search selecting features using a greedy algorithm that attempts to maximize

Mentions: A simple algorithm for feature selection with a filter is a greedy forward selection search that seeks to maximize feature scoring function , which is shown in Fig. 1. The search initializes the relevant feature set be empty, then for k iterations, an objective function is maximized. For example, this objective function could be written as(2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$\begin{array}{*{20}l} {}\mathcal{J}(X, Y, \mathcal{F}) = \textsf{I}(X;Y) - \alpha\!\sum_{X'\in\mathcal{F}} \textsf{I}(X;X') + \beta\sum_{X'\in\mathcal{F}} \textsf{I}(X;X'/Y) \end{array} $$ \end{document}J(X,Y,F)=I(X;Y)−α∑X′∈FI(X;X′)+β∑X′∈FI(X;X′/Y)Fig. 1


Fizzy: feature subset selection for metagenomics.

Ditzler G, Morrison JC, Lan Y, Rosen GL - BMC Bioinformatics (2015)

Pseudo code for search selecting features using a greedy algorithm that attempts to maximize
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4634798&req=5

Fig1: Pseudo code for search selecting features using a greedy algorithm that attempts to maximize
Mentions: A simple algorithm for feature selection with a filter is a greedy forward selection search that seeks to maximize feature scoring function , which is shown in Fig. 1. The search initializes the relevant feature set be empty, then for k iterations, an objective function is maximized. For example, this objective function could be written as(2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$\begin{array}{*{20}l} {}\mathcal{J}(X, Y, \mathcal{F}) = \textsf{I}(X;Y) - \alpha\!\sum_{X'\in\mathcal{F}} \textsf{I}(X;X') + \beta\sum_{X'\in\mathcal{F}} \textsf{I}(X;X'/Y) \end{array} $$ \end{document}J(X,Y,F)=I(X;Y)−α∑X′∈FI(X;X′)+β∑X′∈FI(X;X′/Y)Fig. 1

Bottom Line: For example, in a previous study we have used information-theoretic feature selection to understand the differences between protein family abundances that best discriminate between age groups in the human gut microbiome.We demonstrate the software tools capabilities on publicly available datasets.We have made the software implementation of Fizzy available to the public under the GNU GPL license.

View Article: PubMed Central - PubMed

Affiliation: Department of Electrical & Computer Engineering, The University of Arizona, 1230 E Speedway Blvd., ECE Bldg., Tucson, 85721, AZ, USA. gregory.ditzler@gmail.com.

ABSTRACT

Background: Some of the current software tools for comparative metagenomics provide ecologists with the ability to investigate and explore bacterial communities using α- & β-diversity. Feature subset selection--a sub-field of machine learning--can also provide a unique insight into the differences between metagenomic or 16S phenotypes. In particular, feature subset selection methods can obtain the operational taxonomic units (OTUs), or functional features, that have a high-level of influence on the condition being studied. For example, in a previous study we have used information-theoretic feature selection to understand the differences between protein family abundances that best discriminate between age groups in the human gut microbiome.

Results: We have developed a new Python command line tool, which is compatible with the widely adopted BIOM format, for microbial ecologists that implements information-theoretic subset selection methods for biological data formats. We demonstrate the software tools capabilities on publicly available datasets.

Conclusions: We have made the software implementation of Fizzy available to the public under the GNU GPL license. The standalone implementation can be found at http://github.com/EESI/Fizzy.

No MeSH data available.


Related in: MedlinePlus