Limits...
An integrated network of Arabidopsis growth regulators and its use for gene prioritization.

Sabaghian E, Drebert Z, Inzé D, Saeys Y - Sci Rep (2015)

Bottom Line: Elucidating the molecular mechanisms that govern plant growth has been an important topic in plant research, and current advances in large-scale data generation call for computational tools that efficiently combine these different data sources to generate novel hypotheses.In this work, we present a novel, integrated network that combines multiple large-scale data sources to characterize growth regulatory genes in Arabidopsis, one of the main plant model organisms.In addition, the integrated network is made available to the scientific community, providing a rich data source that will be useful for many biological processes, not necessarily restricted to plant growth.

View Article: PubMed Central - PubMed

Affiliation: Department of Plant Systems Biology, VIB, 9052 Gent, Belgium.

ABSTRACT
Elucidating the molecular mechanisms that govern plant growth has been an important topic in plant research, and current advances in large-scale data generation call for computational tools that efficiently combine these different data sources to generate novel hypotheses. In this work, we present a novel, integrated network that combines multiple large-scale data sources to characterize growth regulatory genes in Arabidopsis, one of the main plant model organisms. The contributions of this work are twofold: first, we characterized a set of carefully selected growth regulators with respect to their connectivity patterns in the integrated network, and, subsequently, we explored to which extent these connectivity patterns can be used to suggest new growth regulators. Using a large-scale comparative study, we designed new supervised machine learning methods to prioritize growth regulators. Our results show that these methods significantly improve current state-of-the-art prioritization techniques, and are able to suggest meaningful new growth regulators. In addition, the integrated network is made available to the scientific community, providing a rich data source that will be useful for many biological processes, not necessarily restricted to plant growth.

No MeSH data available.


Related in: MedlinePlus

Leave-One-Out Cross-Validation (LOOCV) Workflow.Graphical overview of the workflow when assessing the predictive performance of methods using the LOOCV setup. (a) Preparing data and making two classes to feed the computational parts, (b) extract multiple types of features from the network, and (c) combining network-derived features with machine learning models resulting in model-based prioritization approaches.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4664945&req=5

f4: Leave-One-Out Cross-Validation (LOOCV) Workflow.Graphical overview of the workflow when assessing the predictive performance of methods using the LOOCV setup. (a) Preparing data and making two classes to feed the computational parts, (b) extract multiple types of features from the network, and (c) combining network-derived features with machine learning models resulting in model-based prioritization approaches.

Mentions: In the model-based approach for prioritization, we used machine learning models to define the relationship between the network properties and the genes in the seed set S. We used a two-class classification approach for the prioritization problem, where the genes in S were assumed to constitute the “positive” class, and the remaining genes represented the “negative and unknown” class. In our case, the positive class was the set of 147 genes involved in leaf growth, and the negative/unknown class consisted of the remaining 27,229 genes of the Arabidopsis genome. To evaluate the prioritization performance of the different methods, a leave-one-out cross-validation (LOOCV) setup was used, which is standard in the prioritization field10. In this setup, one of the GR genes was removed from the list of known GR genes, and subsequently all genes in the network were ranked, the most top-ranked genes corresponding to the most likely GR genes. The position of the left-out gene could then be recorded, and this procedure was subsequently repeated for all GR genes. Performance statistics, such as the median or average rank of all GR genes could then be used to evaluate prioritization performance. A general overview of the followed procedure is shown in Fig. 4.


An integrated network of Arabidopsis growth regulators and its use for gene prioritization.

Sabaghian E, Drebert Z, Inzé D, Saeys Y - Sci Rep (2015)

Leave-One-Out Cross-Validation (LOOCV) Workflow.Graphical overview of the workflow when assessing the predictive performance of methods using the LOOCV setup. (a) Preparing data and making two classes to feed the computational parts, (b) extract multiple types of features from the network, and (c) combining network-derived features with machine learning models resulting in model-based prioritization approaches.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4664945&req=5

f4: Leave-One-Out Cross-Validation (LOOCV) Workflow.Graphical overview of the workflow when assessing the predictive performance of methods using the LOOCV setup. (a) Preparing data and making two classes to feed the computational parts, (b) extract multiple types of features from the network, and (c) combining network-derived features with machine learning models resulting in model-based prioritization approaches.
Mentions: In the model-based approach for prioritization, we used machine learning models to define the relationship between the network properties and the genes in the seed set S. We used a two-class classification approach for the prioritization problem, where the genes in S were assumed to constitute the “positive” class, and the remaining genes represented the “negative and unknown” class. In our case, the positive class was the set of 147 genes involved in leaf growth, and the negative/unknown class consisted of the remaining 27,229 genes of the Arabidopsis genome. To evaluate the prioritization performance of the different methods, a leave-one-out cross-validation (LOOCV) setup was used, which is standard in the prioritization field10. In this setup, one of the GR genes was removed from the list of known GR genes, and subsequently all genes in the network were ranked, the most top-ranked genes corresponding to the most likely GR genes. The position of the left-out gene could then be recorded, and this procedure was subsequently repeated for all GR genes. Performance statistics, such as the median or average rank of all GR genes could then be used to evaluate prioritization performance. A general overview of the followed procedure is shown in Fig. 4.

Bottom Line: Elucidating the molecular mechanisms that govern plant growth has been an important topic in plant research, and current advances in large-scale data generation call for computational tools that efficiently combine these different data sources to generate novel hypotheses.In this work, we present a novel, integrated network that combines multiple large-scale data sources to characterize growth regulatory genes in Arabidopsis, one of the main plant model organisms.In addition, the integrated network is made available to the scientific community, providing a rich data source that will be useful for many biological processes, not necessarily restricted to plant growth.

View Article: PubMed Central - PubMed

Affiliation: Department of Plant Systems Biology, VIB, 9052 Gent, Belgium.

ABSTRACT
Elucidating the molecular mechanisms that govern plant growth has been an important topic in plant research, and current advances in large-scale data generation call for computational tools that efficiently combine these different data sources to generate novel hypotheses. In this work, we present a novel, integrated network that combines multiple large-scale data sources to characterize growth regulatory genes in Arabidopsis, one of the main plant model organisms. The contributions of this work are twofold: first, we characterized a set of carefully selected growth regulators with respect to their connectivity patterns in the integrated network, and, subsequently, we explored to which extent these connectivity patterns can be used to suggest new growth regulators. Using a large-scale comparative study, we designed new supervised machine learning methods to prioritize growth regulators. Our results show that these methods significantly improve current state-of-the-art prioritization techniques, and are able to suggest meaningful new growth regulators. In addition, the integrated network is made available to the scientific community, providing a rich data source that will be useful for many biological processes, not necessarily restricted to plant growth.

No MeSH data available.


Related in: MedlinePlus