Limits...
Graph reconstruction using covariance-based methods

View Article: PubMed Central - PubMed

ABSTRACT

Methods based on correlation and partial correlation are today employed in the reconstruction of a statistical interaction graph from high-throughput omics data. These dedicated methods work well even for the case when the number of variables exceeds the number of samples. In this study, we investigate how the graphs extracted from covariance and concentration matrix estimates are related by using Neumann series and transitive closure and through discussing concrete small examples. Considering the ideal case where the true graph is available, we also compare correlation and partial correlation methods for large realistic graphs. In particular, we perform the comparisons with optimally selected parameters based on the true underlying graph and with data-driven approaches where the parameters are directly estimated from the data.

Electronic supplementary material: The online version of this article (doi:10.1186/s13637-016-0052-y) contains supplementary material, which is available to authorized users.

No MeSH data available.


Selecting penalty parameters in the covariance Lasso by cross-validation approach for four graph types. The log-likelihood values are computed for a range of penalty parameters. Cross-validation selects the penalty parameter for which the log-likelihood attains a maximum value
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5121191&req=5

Fig8: Selecting penalty parameters in the covariance Lasso by cross-validation approach for four graph types. The log-likelihood values are computed for a range of penalty parameters. Cross-validation selects the penalty parameter for which the log-likelihood attains a maximum value

Mentions: To choose the penalty parameter λcov from the data, we compute it by cross-validation procedure. We perform fivefold cross-validation and select the penalty parameter that maximizes the log-likelihood function in (31). Figure 8 depicts computed likelihood values with the penalty parameters selected from a range λcov∈[0,7]. The results show that the maximum likelihood values for all graphs exist almost in a close range of the penalty parameter. For chain and cluster graphs, the maxima are attained between λcov=3 and λcov=5, whereas for scale-free and hub graphs, between λcov=4 and λcov=6. Therefore, the penalty parameters for further simulations, we have chosen from these ranges where the maximum for the log-likelihood is attained. We then performed the covariance graph estimation using these penalty parameters. Unfortunately, we observe that in all cases, these penalty values lead to the overestimation of the graph. In particular, a lot of false positive edges are selected in the estimated graph.Fig. 8


Graph reconstruction using covariance-based methods
Selecting penalty parameters in the covariance Lasso by cross-validation approach for four graph types. The log-likelihood values are computed for a range of penalty parameters. Cross-validation selects the penalty parameter for which the log-likelihood attains a maximum value
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5121191&req=5

Fig8: Selecting penalty parameters in the covariance Lasso by cross-validation approach for four graph types. The log-likelihood values are computed for a range of penalty parameters. Cross-validation selects the penalty parameter for which the log-likelihood attains a maximum value
Mentions: To choose the penalty parameter λcov from the data, we compute it by cross-validation procedure. We perform fivefold cross-validation and select the penalty parameter that maximizes the log-likelihood function in (31). Figure 8 depicts computed likelihood values with the penalty parameters selected from a range λcov∈[0,7]. The results show that the maximum likelihood values for all graphs exist almost in a close range of the penalty parameter. For chain and cluster graphs, the maxima are attained between λcov=3 and λcov=5, whereas for scale-free and hub graphs, between λcov=4 and λcov=6. Therefore, the penalty parameters for further simulations, we have chosen from these ranges where the maximum for the log-likelihood is attained. We then performed the covariance graph estimation using these penalty parameters. Unfortunately, we observe that in all cases, these penalty values lead to the overestimation of the graph. In particular, a lot of false positive edges are selected in the estimated graph.Fig. 8

View Article: PubMed Central - PubMed

ABSTRACT

Methods based on correlation and partial correlation are today employed in the reconstruction of a statistical interaction graph from high-throughput omics data. These dedicated methods work well even for the case when the number of variables exceeds the number of samples. In this study, we investigate how the graphs extracted from covariance and concentration matrix estimates are related by using Neumann series and transitive closure and through discussing concrete small examples. Considering the ideal case where the true graph is available, we also compare correlation and partial correlation methods for large realistic graphs. In particular, we perform the comparisons with optimally selected parameters based on the true underlying graph and with data-driven approaches where the parameters are directly estimated from the data.

Electronic supplementary material: The online version of this article (doi:10.1186/s13637-016-0052-y) contains supplementary material, which is available to authorized users.

No MeSH data available.