Limits...
Graph reconstruction using covariance-based methods

View Article: PubMed Central - PubMed

ABSTRACT

Methods based on correlation and partial correlation are today employed in the reconstruction of a statistical interaction graph from high-throughput omics data. These dedicated methods work well even for the case when the number of variables exceeds the number of samples. In this study, we investigate how the graphs extracted from covariance and concentration matrix estimates are related by using Neumann series and transitive closure and through discussing concrete small examples. Considering the ideal case where the true graph is available, we also compare correlation and partial correlation methods for large realistic graphs. In particular, we perform the comparisons with optimally selected parameters based on the true underlying graph and with data-driven approaches where the parameters are directly estimated from the data.

Electronic supplementary material: The online version of this article (doi:10.1186/s13637-016-0052-y) contains supplementary material, which is available to authorized users.

No MeSH data available.


Workflow for generating synthetic data from a given graph topology. Initially, we construct a graph of interest and then build the adjacency matrix A which elements are ones and zeros. In the next step, we transform A to the positive definite matrix B. We then take an inverse of the positive definite matrix B and calculate the correlation matrix C. In the next step, we factorize the correlation matrix using a Cholesky decomposition and obtain an upper triangular matrix U. We then generate a random matrix R, the columns of which are independent and identically distributed from . A row size of R is equal to a column size of U, and a column size is equal to a sample size that we want to generate. Finally, we multiply R with U to get a new data with the sample size of interest
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5121191&req=5

Fig4: Workflow for generating synthetic data from a given graph topology. Initially, we construct a graph of interest and then build the adjacency matrix A which elements are ones and zeros. In the next step, we transform A to the positive definite matrix B. We then take an inverse of the positive definite matrix B and calculate the correlation matrix C. In the next step, we factorize the correlation matrix using a Cholesky decomposition and obtain an upper triangular matrix U. We then generate a random matrix R, the columns of which are independent and identically distributed from . A row size of R is equal to a column size of U, and a column size is equal to a sample size that we want to generate. Finally, we multiply R with U to get a new data with the sample size of interest

Mentions: In the following, we investigate how this scaling parameter affects indirect edges of different order with numerical simulations. For this purpose, we choose a six-node chain graph, generate synthetic data using the workflow illustrated in Fig. 4, and compute the correlation matrix. The covariance graph reconstructed from the correlation matrix is accordingly fully connected and has five direct and ten indirect edges, where edges of the same order were assigned the same weight.


Graph reconstruction using covariance-based methods
Workflow for generating synthetic data from a given graph topology. Initially, we construct a graph of interest and then build the adjacency matrix A which elements are ones and zeros. In the next step, we transform A to the positive definite matrix B. We then take an inverse of the positive definite matrix B and calculate the correlation matrix C. In the next step, we factorize the correlation matrix using a Cholesky decomposition and obtain an upper triangular matrix U. We then generate a random matrix R, the columns of which are independent and identically distributed from . A row size of R is equal to a column size of U, and a column size is equal to a sample size that we want to generate. Finally, we multiply R with U to get a new data with the sample size of interest
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5121191&req=5

Fig4: Workflow for generating synthetic data from a given graph topology. Initially, we construct a graph of interest and then build the adjacency matrix A which elements are ones and zeros. In the next step, we transform A to the positive definite matrix B. We then take an inverse of the positive definite matrix B and calculate the correlation matrix C. In the next step, we factorize the correlation matrix using a Cholesky decomposition and obtain an upper triangular matrix U. We then generate a random matrix R, the columns of which are independent and identically distributed from . A row size of R is equal to a column size of U, and a column size is equal to a sample size that we want to generate. Finally, we multiply R with U to get a new data with the sample size of interest
Mentions: In the following, we investigate how this scaling parameter affects indirect edges of different order with numerical simulations. For this purpose, we choose a six-node chain graph, generate synthetic data using the workflow illustrated in Fig. 4, and compute the correlation matrix. The covariance graph reconstructed from the correlation matrix is accordingly fully connected and has five direct and ten indirect edges, where edges of the same order were assigned the same weight.

View Article: PubMed Central - PubMed

ABSTRACT

Methods based on correlation and partial correlation are today employed in the reconstruction of a statistical interaction graph from high-throughput omics data. These dedicated methods work well even for the case when the number of variables exceeds the number of samples. In this study, we investigate how the graphs extracted from covariance and concentration matrix estimates are related by using Neumann series and transitive closure and through discussing concrete small examples. Considering the ideal case where the true graph is available, we also compare correlation and partial correlation methods for large realistic graphs. In particular, we perform the comparisons with optimally selected parameters based on the true underlying graph and with data-driven approaches where the parameters are directly estimated from the data.

Electronic supplementary material: The online version of this article (doi:10.1186/s13637-016-0052-y) contains supplementary material, which is available to authorized users.

No MeSH data available.