Limits...
Graph reconstruction using covariance-based methods

View Article: PubMed Central - PubMed

ABSTRACT

Methods based on correlation and partial correlation are today employed in the reconstruction of a statistical interaction graph from high-throughput omics data. These dedicated methods work well even for the case when the number of variables exceeds the number of samples. In this study, we investigate how the graphs extracted from covariance and concentration matrix estimates are related by using Neumann series and transitive closure and through discussing concrete small examples. Considering the ideal case where the true graph is available, we also compare correlation and partial correlation methods for large realistic graphs. In particular, we perform the comparisons with optimally selected parameters based on the true underlying graph and with data-driven approaches where the parameters are directly estimated from the data.

Electronic supplementary material: The online version of this article (doi:10.1186/s13637-016-0052-y) contains supplementary material, which is available to authorized users.

No MeSH data available.


Related in: MedlinePlus

Influence of correlation strength on predictions in case of the chain graph (p=50, n=30). a Thresholded sample covariance matrix. b Covariance Lasso. c Nodewise regression Lasso. d Graphical Lasso Illustrated are predictions with the different correlation strength as indicated with (I) low correlation, σ≈0.15 (II) moderate correlation, σ≈0.19 (III) moderate-high correlation, σ≈0.22 and (IV) high correlation, σ≈0.36
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5121191&req=5

Fig10: Influence of correlation strength on predictions in case of the chain graph (p=50, n=30). a Thresholded sample covariance matrix. b Covariance Lasso. c Nodewise regression Lasso. d Graphical Lasso Illustrated are predictions with the different correlation strength as indicated with (I) low correlation, σ≈0.15 (II) moderate correlation, σ≈0.19 (III) moderate-high correlation, σ≈0.22 and (IV) high correlation, σ≈0.36

Mentions: Figure 10 depicts optimal predictions produced by four methods in case of different correlation strengths on the chain graph. Sensitivity of predictions by four methods computed as the average ratio of correctly predicted to total predicted edges is given in Table 2. In this case, we choose the optimal threshold and the penalty based on the shortest Euclidean distance from true edges. When the magnitude of correlations is low (standard deviation, σ≈0.15, colored in blue), the performance of methods is relatively poor. In this regime, all methods predict about 1/4 of correct edges. Increasing the magnitude of correlation positively affects the performance of all methods (II, III, and IV). For instance, at σ≈0.19, the sensitivity of the thresholded sample covariance matrix predictions increases from 0.23 to 0.67. In this regime, the sensitivity of the covariance Lasso increases from 0.24 to 0.72 (12 to 30 edges), while the sensitivity for the nodewise regression Lasso and the graphical Lasso increases from 0.24 to 0.7 (from 13 to 35 edges). The accuracy of covariance Lasso predictions does not change so much from II to IV, indicating a saturation effect of the method. The saturation effect is also observed for the thresholded sample covariance matrix from (III) to (IV). In contrast, the sensitivity of the nodewise regression Lasso and the graphical Lasso predictions increases with the increasing correlation strength. In the regime (III), the sensitivity of the nodewise regression Lasso is about 0.83, whereas at (IV), it is almost 0.93. The sensitivity of the graphical Lasso increases from 0.75 (III) to 0.82 (IV).Fig. 10


Graph reconstruction using covariance-based methods
Influence of correlation strength on predictions in case of the chain graph (p=50, n=30). a Thresholded sample covariance matrix. b Covariance Lasso. c Nodewise regression Lasso. d Graphical Lasso Illustrated are predictions with the different correlation strength as indicated with (I) low correlation, σ≈0.15 (II) moderate correlation, σ≈0.19 (III) moderate-high correlation, σ≈0.22 and (IV) high correlation, σ≈0.36
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5121191&req=5

Fig10: Influence of correlation strength on predictions in case of the chain graph (p=50, n=30). a Thresholded sample covariance matrix. b Covariance Lasso. c Nodewise regression Lasso. d Graphical Lasso Illustrated are predictions with the different correlation strength as indicated with (I) low correlation, σ≈0.15 (II) moderate correlation, σ≈0.19 (III) moderate-high correlation, σ≈0.22 and (IV) high correlation, σ≈0.36
Mentions: Figure 10 depicts optimal predictions produced by four methods in case of different correlation strengths on the chain graph. Sensitivity of predictions by four methods computed as the average ratio of correctly predicted to total predicted edges is given in Table 2. In this case, we choose the optimal threshold and the penalty based on the shortest Euclidean distance from true edges. When the magnitude of correlations is low (standard deviation, σ≈0.15, colored in blue), the performance of methods is relatively poor. In this regime, all methods predict about 1/4 of correct edges. Increasing the magnitude of correlation positively affects the performance of all methods (II, III, and IV). For instance, at σ≈0.19, the sensitivity of the thresholded sample covariance matrix predictions increases from 0.23 to 0.67. In this regime, the sensitivity of the covariance Lasso increases from 0.24 to 0.72 (12 to 30 edges), while the sensitivity for the nodewise regression Lasso and the graphical Lasso increases from 0.24 to 0.7 (from 13 to 35 edges). The accuracy of covariance Lasso predictions does not change so much from II to IV, indicating a saturation effect of the method. The saturation effect is also observed for the thresholded sample covariance matrix from (III) to (IV). In contrast, the sensitivity of the nodewise regression Lasso and the graphical Lasso predictions increases with the increasing correlation strength. In the regime (III), the sensitivity of the nodewise regression Lasso is about 0.83, whereas at (IV), it is almost 0.93. The sensitivity of the graphical Lasso increases from 0.75 (III) to 0.82 (IV).Fig. 10

View Article: PubMed Central - PubMed

ABSTRACT

Methods based on correlation and partial correlation are today employed in the reconstruction of a statistical interaction graph from high-throughput omics data. These dedicated methods work well even for the case when the number of variables exceeds the number of samples. In this study, we investigate how the graphs extracted from covariance and concentration matrix estimates are related by using Neumann series and transitive closure and through discussing concrete small examples. Considering the ideal case where the true graph is available, we also compare correlation and partial correlation methods for large realistic graphs. In particular, we perform the comparisons with optimally selected parameters based on the true underlying graph and with data-driven approaches where the parameters are directly estimated from the data.

Electronic supplementary material: The online version of this article (doi:10.1186/s13637-016-0052-y) contains supplementary material, which is available to authorized users.

No MeSH data available.


Related in: MedlinePlus