Limits...
An integrated network of Arabidopsis growth regulators and its use for gene prioritization.

Sabaghian E, Drebert Z, Inzé D, Saeys Y - Sci Rep (2015)

Bottom Line: Elucidating the molecular mechanisms that govern plant growth has been an important topic in plant research, and current advances in large-scale data generation call for computational tools that efficiently combine these different data sources to generate novel hypotheses.In this work, we present a novel, integrated network that combines multiple large-scale data sources to characterize growth regulatory genes in Arabidopsis, one of the main plant model organisms.In addition, the integrated network is made available to the scientific community, providing a rich data source that will be useful for many biological processes, not necessarily restricted to plant growth.

View Article: PubMed Central - PubMed

Affiliation: Department of Plant Systems Biology, VIB, 9052 Gent, Belgium.

ABSTRACT
Elucidating the molecular mechanisms that govern plant growth has been an important topic in plant research, and current advances in large-scale data generation call for computational tools that efficiently combine these different data sources to generate novel hypotheses. In this work, we present a novel, integrated network that combines multiple large-scale data sources to characterize growth regulatory genes in Arabidopsis, one of the main plant model organisms. The contributions of this work are twofold: first, we characterized a set of carefully selected growth regulators with respect to their connectivity patterns in the integrated network, and, subsequently, we explored to which extent these connectivity patterns can be used to suggest new growth regulators. Using a large-scale comparative study, we designed new supervised machine learning methods to prioritize growth regulators. Our results show that these methods significantly improve current state-of-the-art prioritization techniques, and are able to suggest meaningful new growth regulators. In addition, the integrated network is made available to the scientific community, providing a rich data source that will be useful for many biological processes, not necessarily restricted to plant growth.

No MeSH data available.


Related in: MedlinePlus

The Effect of GO Term Features on Classifier Performance.The effect of the inclusion of GO terms when using model-based approaches. By adding GO similarity scores as a new feature to the model-based approaches, all of them improved their ability in order to rank more GR genes on the top list. Each box plot shows the ranking of all 147 GR genes in the list of 27,290 genes. The approach that gives lower ranks to GR genes has a box plot shifted more towards zero on the y axis.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4664945&req=5

f2: The Effect of GO Term Features on Classifier Performance.The effect of the inclusion of GO terms when using model-based approaches. By adding GO similarity scores as a new feature to the model-based approaches, all of them improved their ability in order to rank more GR genes on the top list. Each box plot shows the ranking of all 147 GR genes in the list of 27,290 genes. The approach that gives lower ranks to GR genes has a box plot shifted more towards zero on the y axis.

Mentions: We trained a number of well-known machine learning methods, including Naïve Bayes (NB), Linear Discriminant Analysis (LDA), Support Vector Machines (SVM), Lasso and elastic-net regularized generalized linear models (Glmnet), Random Forest (RF), and Generalized Boosted Regression Models (GBM), to learn the mapping between network-based properties and involvement in growth regulation. Two classes of features were used: network-based features and Gene Ontology (GO)-derived features. Figure 2 displays the comparison of the results using a) only the network-based features (without_GO), and b) including also the GO-based features. Including the GO-based features within the model-based approaches clearly boosts their ability in predicting GR genes in a leave-one-out cross-validation (LOOCV) scheme (see Methods). For all methods, this resulted in a lower median rank and likewise, a lower first quartile, which is the most important part of the ranking if genes are to be evaluated in a top-down fashion (Table 3).


An integrated network of Arabidopsis growth regulators and its use for gene prioritization.

Sabaghian E, Drebert Z, Inzé D, Saeys Y - Sci Rep (2015)

The Effect of GO Term Features on Classifier Performance.The effect of the inclusion of GO terms when using model-based approaches. By adding GO similarity scores as a new feature to the model-based approaches, all of them improved their ability in order to rank more GR genes on the top list. Each box plot shows the ranking of all 147 GR genes in the list of 27,290 genes. The approach that gives lower ranks to GR genes has a box plot shifted more towards zero on the y axis.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4664945&req=5

f2: The Effect of GO Term Features on Classifier Performance.The effect of the inclusion of GO terms when using model-based approaches. By adding GO similarity scores as a new feature to the model-based approaches, all of them improved their ability in order to rank more GR genes on the top list. Each box plot shows the ranking of all 147 GR genes in the list of 27,290 genes. The approach that gives lower ranks to GR genes has a box plot shifted more towards zero on the y axis.
Mentions: We trained a number of well-known machine learning methods, including Naïve Bayes (NB), Linear Discriminant Analysis (LDA), Support Vector Machines (SVM), Lasso and elastic-net regularized generalized linear models (Glmnet), Random Forest (RF), and Generalized Boosted Regression Models (GBM), to learn the mapping between network-based properties and involvement in growth regulation. Two classes of features were used: network-based features and Gene Ontology (GO)-derived features. Figure 2 displays the comparison of the results using a) only the network-based features (without_GO), and b) including also the GO-based features. Including the GO-based features within the model-based approaches clearly boosts their ability in predicting GR genes in a leave-one-out cross-validation (LOOCV) scheme (see Methods). For all methods, this resulted in a lower median rank and likewise, a lower first quartile, which is the most important part of the ranking if genes are to be evaluated in a top-down fashion (Table 3).

Bottom Line: Elucidating the molecular mechanisms that govern plant growth has been an important topic in plant research, and current advances in large-scale data generation call for computational tools that efficiently combine these different data sources to generate novel hypotheses.In this work, we present a novel, integrated network that combines multiple large-scale data sources to characterize growth regulatory genes in Arabidopsis, one of the main plant model organisms.In addition, the integrated network is made available to the scientific community, providing a rich data source that will be useful for many biological processes, not necessarily restricted to plant growth.

View Article: PubMed Central - PubMed

Affiliation: Department of Plant Systems Biology, VIB, 9052 Gent, Belgium.

ABSTRACT
Elucidating the molecular mechanisms that govern plant growth has been an important topic in plant research, and current advances in large-scale data generation call for computational tools that efficiently combine these different data sources to generate novel hypotheses. In this work, we present a novel, integrated network that combines multiple large-scale data sources to characterize growth regulatory genes in Arabidopsis, one of the main plant model organisms. The contributions of this work are twofold: first, we characterized a set of carefully selected growth regulators with respect to their connectivity patterns in the integrated network, and, subsequently, we explored to which extent these connectivity patterns can be used to suggest new growth regulators. Using a large-scale comparative study, we designed new supervised machine learning methods to prioritize growth regulators. Our results show that these methods significantly improve current state-of-the-art prioritization techniques, and are able to suggest meaningful new growth regulators. In addition, the integrated network is made available to the scientific community, providing a rich data source that will be useful for many biological processes, not necessarily restricted to plant growth.

No MeSH data available.


Related in: MedlinePlus