Integrative random forest for gene regulatory network inference.
Bottom Line: Gene regulatory network (GRN) inference based on genomic data is one of the most actively pursued computational biological problems.Because different types of biological data usually provide complementary information regarding the underlying GRN, a model that integrates big data of diverse types is expected to increase both the power and accuracy of GRN inference.We apply iRafNet to construct GRN in Saccharomyces cerevisiae and demonstrate that it improves the performance in predicting TF-target gene regulations and provides additional functional insights to the predicted gene regulations.
Affiliation: Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.Show MeSH
Mentions: In this article, we introduce a weighted sampling scheme under the framework of random forest to allow the integration of heterogeneous data types. As shown in Figure 1, first, iRafNet processes supporting data to derive the prior belief of regulatory relationships among genes, then, it integrates such prior information to the main dataset via random forest to construct the final GRN. We consider different genomic data including gene expression data from steady-state experiments, time-series experiments, knockout experiments and other biological data such as protein–protein interactions. As shown in Figure 1, one data source is considered as main input data for random forest inference while other D datasets (supporting data) are utilized to derive prior information. iRafNet can be summarized in the following major steps, and detailed information regarding each step is provided in later sections:
Affiliation: Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.