Limits...
Microarray enriched gene rank.

Demidenko E - BioData Min (2015)

Bottom Line: We have shown by examples that findings based on GR confirm biological expectations.GR may be used for hypothesis generation on gene pathways.It may be used for a homogeneous sample or for comparison of gene connectivity among cases and controls, or in longitudinal setting.

View Article: PubMed Central - PubMed

Affiliation: Department of Biomedical Data Science, Institute for Quantitative Biomedical Sciences, Geisel School of Medicine at Dartmouth, Hanover, 03755 NH USA.

ABSTRACT

Background: We develop a new concept that reflects how genes are connected based on microarray data using the coefficient of determination (the squared Pearson correlation coefficient). Our gene rank combines a priori knowledge about gene connectivity, say, from the Gene Ontology (GO) database, and the microarray expression data at hand, called the microarray enriched gene rank, or simply gene rank (GR). GR, similarly to Google PageRank, is defined in a recursive fashion and is computed as the left maximum eigenvector of a stochastic matrix derived from microarray expression data. An efficient algorithm is devised that allows computation of GR for 50 thousand genes with 500 samples within minutes on a personal computer using the public domain statistical package R.

Results: Computation of GR is illustrated with several microarray data sets. In particular, we apply GR (1) to answer whether bad genes are more connected than good genes in relation with cancer patient survival, (2) to associate gene connectivity with cluster/subtypes in ovarian cancer tumors, and to determine whether gene connectivity changes (3) from organ to organ within the same organism and (4) between organisms.

Conclusions: We have shown by examples that findings based on GR confirm biological expectations. GR may be used for hypothesis generation on gene pathways. It may be used for a homogeneous sample or for comparison of gene connectivity among cases and controls, or in longitudinal setting.

No MeSH data available.


Related in: MedlinePlus

Cumulative distribution functions (cdfs) for four stages of rice development. The later development has greater gene connectivity expressed via GR on the entire range.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4305247&req=5

Fig6: Cumulative distribution functions (cdfs) for four stages of rice development. The later development has greater gene connectivity expressed via GR on the entire range.

Mentions: This definition can be interpreted as follows: For every z, the proportion of X values smaller than z is larger than the proportion of Y values smaller than z. For example, using this definition, we can state that women are shorter than men in a stochastic sense because the proportion of women is larger than the proportion of men among people of height ≤z for each z. Note that the inequality between the means does not imply stochastic inequality; however, it can be proven mathematically that stochastic inequality implies inequality between means and medians. The stochastic inequality is the most stringent inequality between random variables. We use this definition to demonstrate that the complexity (connectivity) of genes during rice development increases using the Gaussian cdf GR mixture distribution with parameters shown in Table 1; the results are presented in Figure 6. Indeed, the GR cdf for each consecutive stage shifts to the right indicating that “Embryogenesis: the top three quarters” has the highest and “Anther development” has the lowest gene rank. For example, we may consider the medians of GR for each rice development stage (depicted by thin lines with the appropriate color): While the median GR in the earliest stage is about 15%, the median GR in the latest stage is about 70%.Figure 6


Microarray enriched gene rank.

Demidenko E - BioData Min (2015)

Cumulative distribution functions (cdfs) for four stages of rice development. The later development has greater gene connectivity expressed via GR on the entire range.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4305247&req=5

Fig6: Cumulative distribution functions (cdfs) for four stages of rice development. The later development has greater gene connectivity expressed via GR on the entire range.
Mentions: This definition can be interpreted as follows: For every z, the proportion of X values smaller than z is larger than the proportion of Y values smaller than z. For example, using this definition, we can state that women are shorter than men in a stochastic sense because the proportion of women is larger than the proportion of men among people of height ≤z for each z. Note that the inequality between the means does not imply stochastic inequality; however, it can be proven mathematically that stochastic inequality implies inequality between means and medians. The stochastic inequality is the most stringent inequality between random variables. We use this definition to demonstrate that the complexity (connectivity) of genes during rice development increases using the Gaussian cdf GR mixture distribution with parameters shown in Table 1; the results are presented in Figure 6. Indeed, the GR cdf for each consecutive stage shifts to the right indicating that “Embryogenesis: the top three quarters” has the highest and “Anther development” has the lowest gene rank. For example, we may consider the medians of GR for each rice development stage (depicted by thin lines with the appropriate color): While the median GR in the earliest stage is about 15%, the median GR in the latest stage is about 70%.Figure 6

Bottom Line: We have shown by examples that findings based on GR confirm biological expectations.GR may be used for hypothesis generation on gene pathways.It may be used for a homogeneous sample or for comparison of gene connectivity among cases and controls, or in longitudinal setting.

View Article: PubMed Central - PubMed

Affiliation: Department of Biomedical Data Science, Institute for Quantitative Biomedical Sciences, Geisel School of Medicine at Dartmouth, Hanover, 03755 NH USA.

ABSTRACT

Background: We develop a new concept that reflects how genes are connected based on microarray data using the coefficient of determination (the squared Pearson correlation coefficient). Our gene rank combines a priori knowledge about gene connectivity, say, from the Gene Ontology (GO) database, and the microarray expression data at hand, called the microarray enriched gene rank, or simply gene rank (GR). GR, similarly to Google PageRank, is defined in a recursive fashion and is computed as the left maximum eigenvector of a stochastic matrix derived from microarray expression data. An efficient algorithm is devised that allows computation of GR for 50 thousand genes with 500 samples within minutes on a personal computer using the public domain statistical package R.

Results: Computation of GR is illustrated with several microarray data sets. In particular, we apply GR (1) to answer whether bad genes are more connected than good genes in relation with cancer patient survival, (2) to associate gene connectivity with cluster/subtypes in ovarian cancer tumors, and to determine whether gene connectivity changes (3) from organ to organ within the same organism and (4) between organisms.

Conclusions: We have shown by examples that findings based on GR confirm biological expectations. GR may be used for hypothesis generation on gene pathways. It may be used for a homogeneous sample or for comparison of gene connectivity among cases and controls, or in longitudinal setting.

No MeSH data available.


Related in: MedlinePlus