Limits...
Supervised learning methods in modeling of CD4+ T cell heterogeneity.

Lu P, Abedi V, Mei Y, Hontecillas R, Hoops S, Carbo A, Bassaganya-Riera J - BioData Min (2015)

Bottom Line: Our results demonstrate that ANN and RF outperform the other two methods.Finally, the running time of different methods was compared, which confirms that ANN is considerably faster than RF.Using machine learning as opposed to ODE-based method reduces the computational complexity of the system and allows one to gain a deeper understanding of the complex interplay between the different related entities.

View Article: PubMed Central - PubMed

Affiliation: The Center for Modeling Immunity to Enteric Pathogens, Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA 24061 USA ; Nutritional Immunology and Molecular Medicine Laboratory, Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA 24061 USA.

ABSTRACT

Background: Modeling of the immune system - a highly non-linear and complex system - requires practical and efficient data analytic approaches. The immune system is composed of heterogeneous cell populations and hundreds of cell types, such as neutrophils, eosinophils, macrophages, dendritic cells, T cells, and B cells. Each cell type is highly diverse and can be further differentiated into subsets with unique and overlapping functions. For example, CD4+ T cells can be differentiated into Th1, Th2, Th17, Th9, Th22, Treg, Tfh, as well as Tr1. Each subset plays different roles in the immune system. To study molecular mechanisms of cell differentiation, computational systems biology approaches can be used to represent these processes; however, the latter often requires building complex intracellular signaling models with a large number of equations to accurately represent intracellular pathways and biochemical reactions. Furthermore, studying the immune system entails integration of complex processes which occur at different time and space scales.

Methods: This study presents and compares four supervised learning methods for modeling CD4+ T cell differentiation: Artificial Neural Networks (ANN), Random Forest (RF), Support Vector Machines (SVM), and Linear Regression (LR). Application of supervised learning methods could reduce the complexity of Ordinary Differential Equations (ODEs)-based intracellular models by only focusing on the input and output cytokine concentrations. In addition, this modeling framework can be efficiently integrated into multiscale models.

Results: Our results demonstrate that ANN and RF outperform the other two methods. Furthermore, ANN and RF have comparable performance when applied to in silico data with and without added noise. The trained models were also able to reproduce dynamic behavior when applied to experimental data; in four out of five cases, model predictions based on ANN and RF correctly predicted the outcome of the system. Finally, the running time of different methods was compared, which confirms that ANN is considerably faster than RF.

Conclusions: Using machine learning as opposed to ODE-based method reduces the computational complexity of the system and allows one to gain a deeper understanding of the complex interplay between the different related entities.

No MeSH data available.


Related in: MedlinePlus

Performance optimization of Random Forest (RF) model. The RF model was created using the randomForest package in R. To optimize the performance of the RF model, two main variables – mtry (numbers of variables randomly sampled as candidates at each split) and ntree (numbers of trees to grow) – were optimized. The RF model with 1000 trees and 4 variables randomly sampled as candidates at each split was identified to perform best
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4559362&req=5

Fig6: Performance optimization of Random Forest (RF) model. The RF model was created using the randomForest package in R. To optimize the performance of the RF model, two main variables – mtry (numbers of variables randomly sampled as candidates at each split) and ntree (numbers of trees to grow) – were optimized. The RF model with 1000 trees and 4 variables randomly sampled as candidates at each split was identified to perform best

Mentions: A RF model was created using the randomForest package in R [23]. The function randomForest is used for building trees, which provides the opportunity to define the number of trees to grow and the number of variables randomly sampled as candidates at each split. For each output cytokine, a Random Forest model was built. In essence, for five outputs, IL17, RORgt INFγ, Tbet, and FOXP3, five Random Forest models were created. To optimize the performance of the RF model, two main variables – mtry and ntree – were optimized (see Fig. 6). By comparing the average absolute difference between the model predictions and real outputs from the test data, the random forest model with 1000 trees and 4 variables randomly sampled as candidates at each split was identified to perform best.Fig. 6


Supervised learning methods in modeling of CD4+ T cell heterogeneity.

Lu P, Abedi V, Mei Y, Hontecillas R, Hoops S, Carbo A, Bassaganya-Riera J - BioData Min (2015)

Performance optimization of Random Forest (RF) model. The RF model was created using the randomForest package in R. To optimize the performance of the RF model, two main variables – mtry (numbers of variables randomly sampled as candidates at each split) and ntree (numbers of trees to grow) – were optimized. The RF model with 1000 trees and 4 variables randomly sampled as candidates at each split was identified to perform best
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4559362&req=5

Fig6: Performance optimization of Random Forest (RF) model. The RF model was created using the randomForest package in R. To optimize the performance of the RF model, two main variables – mtry (numbers of variables randomly sampled as candidates at each split) and ntree (numbers of trees to grow) – were optimized. The RF model with 1000 trees and 4 variables randomly sampled as candidates at each split was identified to perform best
Mentions: A RF model was created using the randomForest package in R [23]. The function randomForest is used for building trees, which provides the opportunity to define the number of trees to grow and the number of variables randomly sampled as candidates at each split. For each output cytokine, a Random Forest model was built. In essence, for five outputs, IL17, RORgt INFγ, Tbet, and FOXP3, five Random Forest models were created. To optimize the performance of the RF model, two main variables – mtry and ntree – were optimized (see Fig. 6). By comparing the average absolute difference between the model predictions and real outputs from the test data, the random forest model with 1000 trees and 4 variables randomly sampled as candidates at each split was identified to perform best.Fig. 6

Bottom Line: Our results demonstrate that ANN and RF outperform the other two methods.Finally, the running time of different methods was compared, which confirms that ANN is considerably faster than RF.Using machine learning as opposed to ODE-based method reduces the computational complexity of the system and allows one to gain a deeper understanding of the complex interplay between the different related entities.

View Article: PubMed Central - PubMed

Affiliation: The Center for Modeling Immunity to Enteric Pathogens, Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA 24061 USA ; Nutritional Immunology and Molecular Medicine Laboratory, Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA 24061 USA.

ABSTRACT

Background: Modeling of the immune system - a highly non-linear and complex system - requires practical and efficient data analytic approaches. The immune system is composed of heterogeneous cell populations and hundreds of cell types, such as neutrophils, eosinophils, macrophages, dendritic cells, T cells, and B cells. Each cell type is highly diverse and can be further differentiated into subsets with unique and overlapping functions. For example, CD4+ T cells can be differentiated into Th1, Th2, Th17, Th9, Th22, Treg, Tfh, as well as Tr1. Each subset plays different roles in the immune system. To study molecular mechanisms of cell differentiation, computational systems biology approaches can be used to represent these processes; however, the latter often requires building complex intracellular signaling models with a large number of equations to accurately represent intracellular pathways and biochemical reactions. Furthermore, studying the immune system entails integration of complex processes which occur at different time and space scales.

Methods: This study presents and compares four supervised learning methods for modeling CD4+ T cell differentiation: Artificial Neural Networks (ANN), Random Forest (RF), Support Vector Machines (SVM), and Linear Regression (LR). Application of supervised learning methods could reduce the complexity of Ordinary Differential Equations (ODEs)-based intracellular models by only focusing on the input and output cytokine concentrations. In addition, this modeling framework can be efficiently integrated into multiscale models.

Results: Our results demonstrate that ANN and RF outperform the other two methods. Furthermore, ANN and RF have comparable performance when applied to in silico data with and without added noise. The trained models were also able to reproduce dynamic behavior when applied to experimental data; in four out of five cases, model predictions based on ANN and RF correctly predicted the outcome of the system. Finally, the running time of different methods was compared, which confirms that ANN is considerably faster than RF.

Conclusions: Using machine learning as opposed to ODE-based method reduces the computational complexity of the system and allows one to gain a deeper understanding of the complex interplay between the different related entities.

No MeSH data available.


Related in: MedlinePlus