Limits...
Positive correlation between gene coexpression and positional clustering in the zebrafish genome.

Ng YK, Wu W, Zhang L - BMC Genomics (2009)

Bottom Line: This paper analyzes correlation between the proximity of eukaryotic genes and their transcriptional expression pattern in the zebrafish (Danio rerio) genome using available microarray data and gene annotation.The analyses show that neighbouring genes are significantly coexpressed in the zebrafish genome, and the coexpression level is influenced by the intergenic distance and transcription orientation.This fact is further supported by examining the coexpression level of genes within positional clusters in the neighbourhood model.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Mathematics, National University of Singapore, 2 Science Drive 2, Singapore 117543, Singapore. matnyk@nus.edu.sg

ABSTRACT

Background: Co-expressing genes tend to cluster in eukaryotic genomes. This paper analyzes correlation between the proximity of eukaryotic genes and their transcriptional expression pattern in the zebrafish (Danio rerio) genome using available microarray data and gene annotation.

Results: The analyses show that neighbouring genes are significantly coexpressed in the zebrafish genome, and the coexpression level is influenced by the intergenic distance and transcription orientation. This fact is further supported by examining the coexpression level of genes within positional clusters in the neighbourhood model. There is a positive correlation between gene coexpression and positional clustering in the zebrafish genome.

Conclusion: The study provides another piece of evidence for the hypothesis that coexpressed genes do cluster in the eukaryotic genomes.

Show MeSH
Distribution of 10,000 mean R values calculated from randomized genome. Each plot shows the distribution of 10,000 mean R values. Each mean R value is calculated by first randomly permuting the gene order of the genome, and then averaging the R values for every pair of neighboring genes in the resulting gene order. The mean R value in the real genome is shown as a single line on each plot. Both plots are based on the same gene expression dataset: (A) the results on the original dataset (average of mean R = 0.03086, σ = 0.00384); (B) the results after tandem duplicates are removed (average of mean R = 0.03071, σ = 0.00389).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2654907&req=5

Figure 1: Distribution of 10,000 mean R values calculated from randomized genome. Each plot shows the distribution of 10,000 mean R values. Each mean R value is calculated by first randomly permuting the gene order of the genome, and then averaging the R values for every pair of neighboring genes in the resulting gene order. The mean R value in the real genome is shown as a single line on each plot. Both plots are based on the same gene expression dataset: (A) the results on the original dataset (average of mean R = 0.03086, σ = 0.00384); (B) the results after tandem duplicates are removed (average of mean R = 0.03071, σ = 0.00389).

Mentions: In order to study the coexpression of proximate genes, we analyzed 100 expression datasets derived from Affymetrix microarray experiments. We use the Pearson correlation coefficient (R) of two genes to measure the level of their coexpression. The mean R of all the neighbouring gene pairs in our dataset is 0.07468 (with standard error 0.00424). This mean value is statistically significant (with p-value 0.0001) as it is +11.4 standard deviations from the mean R in a randomized genome. In a randomized genome with the same genes and expression values, the mean R is only 0.03086 (with standard deviation 0.00384) (Figure 1A). Tandem duplicated genes have identical functions and hence are often highly coexpressed. To eliminate the effects of tandem duplicates on this coexpression study, we removed all members except one in each tandem gene cluster and redid the analysis. After removal of tandem duplicates, the mean R became 0.06844 (with standard error 0.00426). It is slightly smaller than the value when all genes are included in the analysis, but still significant (with p-value 0.0001, +9.7 standard deviations from the random mean) (Figure 1B).


Positive correlation between gene coexpression and positional clustering in the zebrafish genome.

Ng YK, Wu W, Zhang L - BMC Genomics (2009)

Distribution of 10,000 mean R values calculated from randomized genome. Each plot shows the distribution of 10,000 mean R values. Each mean R value is calculated by first randomly permuting the gene order of the genome, and then averaging the R values for every pair of neighboring genes in the resulting gene order. The mean R value in the real genome is shown as a single line on each plot. Both plots are based on the same gene expression dataset: (A) the results on the original dataset (average of mean R = 0.03086, σ = 0.00384); (B) the results after tandem duplicates are removed (average of mean R = 0.03071, σ = 0.00389).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2654907&req=5

Figure 1: Distribution of 10,000 mean R values calculated from randomized genome. Each plot shows the distribution of 10,000 mean R values. Each mean R value is calculated by first randomly permuting the gene order of the genome, and then averaging the R values for every pair of neighboring genes in the resulting gene order. The mean R value in the real genome is shown as a single line on each plot. Both plots are based on the same gene expression dataset: (A) the results on the original dataset (average of mean R = 0.03086, σ = 0.00384); (B) the results after tandem duplicates are removed (average of mean R = 0.03071, σ = 0.00389).
Mentions: In order to study the coexpression of proximate genes, we analyzed 100 expression datasets derived from Affymetrix microarray experiments. We use the Pearson correlation coefficient (R) of two genes to measure the level of their coexpression. The mean R of all the neighbouring gene pairs in our dataset is 0.07468 (with standard error 0.00424). This mean value is statistically significant (with p-value 0.0001) as it is +11.4 standard deviations from the mean R in a randomized genome. In a randomized genome with the same genes and expression values, the mean R is only 0.03086 (with standard deviation 0.00384) (Figure 1A). Tandem duplicated genes have identical functions and hence are often highly coexpressed. To eliminate the effects of tandem duplicates on this coexpression study, we removed all members except one in each tandem gene cluster and redid the analysis. After removal of tandem duplicates, the mean R became 0.06844 (with standard error 0.00426). It is slightly smaller than the value when all genes are included in the analysis, but still significant (with p-value 0.0001, +9.7 standard deviations from the random mean) (Figure 1B).

Bottom Line: This paper analyzes correlation between the proximity of eukaryotic genes and their transcriptional expression pattern in the zebrafish (Danio rerio) genome using available microarray data and gene annotation.The analyses show that neighbouring genes are significantly coexpressed in the zebrafish genome, and the coexpression level is influenced by the intergenic distance and transcription orientation.This fact is further supported by examining the coexpression level of genes within positional clusters in the neighbourhood model.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Mathematics, National University of Singapore, 2 Science Drive 2, Singapore 117543, Singapore. matnyk@nus.edu.sg

ABSTRACT

Background: Co-expressing genes tend to cluster in eukaryotic genomes. This paper analyzes correlation between the proximity of eukaryotic genes and their transcriptional expression pattern in the zebrafish (Danio rerio) genome using available microarray data and gene annotation.

Results: The analyses show that neighbouring genes are significantly coexpressed in the zebrafish genome, and the coexpression level is influenced by the intergenic distance and transcription orientation. This fact is further supported by examining the coexpression level of genes within positional clusters in the neighbourhood model. There is a positive correlation between gene coexpression and positional clustering in the zebrafish genome.

Conclusion: The study provides another piece of evidence for the hypothesis that coexpressed genes do cluster in the eukaryotic genomes.

Show MeSH