Limits...
Graph reconstruction using covariance-based methods

View Article: PubMed Central - PubMed

ABSTRACT

Methods based on correlation and partial correlation are today employed in the reconstruction of a statistical interaction graph from high-throughput omics data. These dedicated methods work well even for the case when the number of variables exceeds the number of samples. In this study, we investigate how the graphs extracted from covariance and concentration matrix estimates are related by using Neumann series and transitive closure and through discussing concrete small examples. Considering the ideal case where the true graph is available, we also compare correlation and partial correlation methods for large realistic graphs. In particular, we perform the comparisons with optimally selected parameters based on the true underlying graph and with data-driven approaches where the parameters are directly estimated from the data.

Electronic supplementary material: The online version of this article (doi:10.1186/s13637-016-0052-y) contains supplementary material, which is available to authorized users.

No MeSH data available.


Predictions based on the adaptive Lasso with the penalty parameter chosen via cross-validation, the nodewise-regression with the optimal penalty, and the random guessing. Depicted are predictions for a chain graph, b cluster graph, c scale-free graph, and d hub graph
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5121191&req=5

Fig9: Predictions based on the adaptive Lasso with the penalty parameter chosen via cross-validation, the nodewise-regression with the optimal penalty, and the random guessing. Depicted are predictions for a chain graph, b cluster graph, c scale-free graph, and d hub graph

Mentions: In order to select a suitable penalty value, we perform cross-validation with the adaptive Lasso (41). We observe that cross-validation with the adaptive Lasso performs very well on chain graphs (Fig. 9a), where the predictions (blue) are in a close range to optimal predictions (red). For cluster and hub graphs, the method performs poorly compared to the optimal one, but still returns better results in contrast to random guessing (Fig. 9b, d). However, in the scale-free graph, the method performs poorly giving predictions almost in the same range as random guessing (Fig. 9c). But one can observe from the scatter plot that on average, the method gives slightly more true positives but at the same time predicts less false positive edges compared to random guessing. One also has to be aware that the scale-free graph used in our study contains far more hub nodes which have more connected edges compared to other nodes. This type of graphs is very difficult to infer under the setting p>n. Other graphs used in the study contain less number of hub nodes and the method performs well on these graphs. For example, the maximum degree of the chain graph is kmax=2, for the cluster graph kmax=4, for the hub graph kmax=9, and for the scale-free graph kmax=13. Therefore, we observe that the penalty selection under cross-validation with the adaptive Lasso is highly dependent on the number of hub nodes in the graph. We also have to mention that the adaptive Lasso method does not take any prior information about the graph topology and applies the uniform penalty on all edges in the graph, which is also a major drawback of the method when applied to graphs which contain more hub nodes. This observation was also reported earlier in the other studies [34–36].Fig. 9


Graph reconstruction using covariance-based methods
Predictions based on the adaptive Lasso with the penalty parameter chosen via cross-validation, the nodewise-regression with the optimal penalty, and the random guessing. Depicted are predictions for a chain graph, b cluster graph, c scale-free graph, and d hub graph
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5121191&req=5

Fig9: Predictions based on the adaptive Lasso with the penalty parameter chosen via cross-validation, the nodewise-regression with the optimal penalty, and the random guessing. Depicted are predictions for a chain graph, b cluster graph, c scale-free graph, and d hub graph
Mentions: In order to select a suitable penalty value, we perform cross-validation with the adaptive Lasso (41). We observe that cross-validation with the adaptive Lasso performs very well on chain graphs (Fig. 9a), where the predictions (blue) are in a close range to optimal predictions (red). For cluster and hub graphs, the method performs poorly compared to the optimal one, but still returns better results in contrast to random guessing (Fig. 9b, d). However, in the scale-free graph, the method performs poorly giving predictions almost in the same range as random guessing (Fig. 9c). But one can observe from the scatter plot that on average, the method gives slightly more true positives but at the same time predicts less false positive edges compared to random guessing. One also has to be aware that the scale-free graph used in our study contains far more hub nodes which have more connected edges compared to other nodes. This type of graphs is very difficult to infer under the setting p>n. Other graphs used in the study contain less number of hub nodes and the method performs well on these graphs. For example, the maximum degree of the chain graph is kmax=2, for the cluster graph kmax=4, for the hub graph kmax=9, and for the scale-free graph kmax=13. Therefore, we observe that the penalty selection under cross-validation with the adaptive Lasso is highly dependent on the number of hub nodes in the graph. We also have to mention that the adaptive Lasso method does not take any prior information about the graph topology and applies the uniform penalty on all edges in the graph, which is also a major drawback of the method when applied to graphs which contain more hub nodes. This observation was also reported earlier in the other studies [34–36].Fig. 9

View Article: PubMed Central - PubMed

ABSTRACT

Methods based on correlation and partial correlation are today employed in the reconstruction of a statistical interaction graph from high-throughput omics data. These dedicated methods work well even for the case when the number of variables exceeds the number of samples. In this study, we investigate how the graphs extracted from covariance and concentration matrix estimates are related by using Neumann series and transitive closure and through discussing concrete small examples. Considering the ideal case where the true graph is available, we also compare correlation and partial correlation methods for large realistic graphs. In particular, we perform the comparisons with optimally selected parameters based on the true underlying graph and with data-driven approaches where the parameters are directly estimated from the data.

Electronic supplementary material: The online version of this article (doi:10.1186/s13637-016-0052-y) contains supplementary material, which is available to authorized users.

No MeSH data available.