Limits...
Willows: a memory efficient tree and forest construction package.

Zhang H, Wang M, Chen X - BMC Bioinformatics (2009)

Bottom Line: However, they cannot deal with the data generated by recent genotyping platforms for single nucleotide polymorphisms due to the massive size of the data and its excessive memory demand.In addition, this package can easily set different options (e.g., algorithms and specifications) and predict the class of test samples.We developed Willows in a user friendly interface with the goal of maximizing the use of memory, which is critical for analysis of genomic data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT 06520-8034, USA. heping.zhang@yale.edu

ABSTRACT

Background: Existing tree and forest methods are powerful bioinformatics tools to explore high dimensional data including high throughput genomic data. However, they cannot deal with the data generated by recent genotyping platforms for single nucleotide polymorphisms due to the massive size of the data and its excessive memory demand.

Results: Using the recursive partitioning technique, we developed a new software package, Willows, to maximize the utility of the computer memory and make it feasible to analyze massive genotype data. This package includes three tree-based methods -- classification tree, random forest, and deterministic forest, and can efficiently handle the massive amount of SNP data. In addition, this package can easily set different options (e.g., algorithms and specifications) and predict the class of test samples.

Conclusion: We developed Willows in a user friendly interface with the goal of maximizing the use of memory, which is critical for analysis of genomic data. The Willows package is well documented and publicly available at (http://c2s2.yale.edu/software/Willows).

Show MeSH
Importance score results in the random forest.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2683818&req=5

Figure 2: Importance score results in the random forest.

Mentions: Depending on the needs, other outputs including the importance score of each variable and the predicted classes in a test sample can be viewed. For example, Figure 2 and Figure 3 show the importance score and prediction results of the two simulated data sets. Furthermore, all of the results are saved in local files for future view. Detailed instructions are provided in our website.


Willows: a memory efficient tree and forest construction package.

Zhang H, Wang M, Chen X - BMC Bioinformatics (2009)

Importance score results in the random forest.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2683818&req=5

Figure 2: Importance score results in the random forest.
Mentions: Depending on the needs, other outputs including the importance score of each variable and the predicted classes in a test sample can be viewed. For example, Figure 2 and Figure 3 show the importance score and prediction results of the two simulated data sets. Furthermore, all of the results are saved in local files for future view. Detailed instructions are provided in our website.

Bottom Line: However, they cannot deal with the data generated by recent genotyping platforms for single nucleotide polymorphisms due to the massive size of the data and its excessive memory demand.In addition, this package can easily set different options (e.g., algorithms and specifications) and predict the class of test samples.We developed Willows in a user friendly interface with the goal of maximizing the use of memory, which is critical for analysis of genomic data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT 06520-8034, USA. heping.zhang@yale.edu

ABSTRACT

Background: Existing tree and forest methods are powerful bioinformatics tools to explore high dimensional data including high throughput genomic data. However, they cannot deal with the data generated by recent genotyping platforms for single nucleotide polymorphisms due to the massive size of the data and its excessive memory demand.

Results: Using the recursive partitioning technique, we developed a new software package, Willows, to maximize the utility of the computer memory and make it feasible to analyze massive genotype data. This package includes three tree-based methods -- classification tree, random forest, and deterministic forest, and can efficiently handle the massive amount of SNP data. In addition, this package can easily set different options (e.g., algorithms and specifications) and predict the class of test samples.

Conclusion: We developed Willows in a user friendly interface with the goal of maximizing the use of memory, which is critical for analysis of genomic data. The Willows package is well documented and publicly available at (http://c2s2.yale.edu/software/Willows).

Show MeSH