Limits...
Integrative bioinformatics analysis of transcriptional regulatory programs in breast cancer cells.

Niida A, Smith AD, Imoto S, Tsutsumi S, Aburatani H, Zhang MQ, Akiyama T - BMC Bioinformatics (2008)

Bottom Line: However, compared with the massive knowledge about the transcriptome, we have surprisingly little knowledge about regulatory mechanisms underling transcriptomic diversity.Our analysis found that motifs bound by ELK1, E2F, NRF1 and NFY are potential regulatory motifs that positively correlate with malignant progression of breast cancer.The results suggest that these 4 motifs are principal regulatory motifs driving malignant progression of breast cancer.

View Article: PubMed Central - HTML - PubMed

Affiliation: Laboratory of Molecular and Genetic Information, Institute of Molecular and Cellular Biosciences, The University of Tokyo, Bunkyo-ku, Tokyo, 110-0032, Japan. niida@iam.u-tokyo.ac.jp

ABSTRACT

Background: Microarray technology has unveiled transcriptomic differences among tumors of various phenotypes, and, especially, brought great progress in molecular understanding of phenotypic diversity of breast tumors. However, compared with the massive knowledge about the transcriptome, we have surprisingly little knowledge about regulatory mechanisms underling transcriptomic diversity.

Results: To gain insights into the transcriptional programs that drive tumor progression, we integrated regulatory sequence data and expression profiles of breast cancer into a Bayesian Network, and searched for cis-regulatory motifs statistically associated with given histological grades and prognosis. Our analysis found that motifs bound by ELK1, E2F, NRF1 and NFY are potential regulatory motifs that positively correlate with malignant progression of breast cancer.

Conclusion: The results suggest that these 4 motifs are principal regulatory motifs driving malignant progression of breast cancer. Our method offers a more concise description about transcriptome diversity among breast tumors with different clinical phenotypes.

Show MeSH

Related in: MedlinePlus

Sequence features associated with differential expression between G1 and G3 breast tumors.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2572072&req=5

Figure 2: Sequence features associated with differential expression between G1 and G3 breast tumors.

Mentions: For each gene in the global expression profile data, we calculated the degree of differential expression between two sample groups (67 G1 and 54 G3 samples). We then applied our method to the differential expression value to search for correlating motifs. The results were evaluated in two ways. First, reproducibility of the result was assessed by bootstrap analysis. Structure learning of a Bayesian network was repeated 30 times using bootstrap samples from the training dataset. We found that V$ELK1_02, V$E2F1_Q4_01, V$NRF1_Q6 and JSP$NF_Y were reproducibly selected by the bootstrap analysis (Figure 2). Here, IDs starting from "V$", "JSP$" and "DME$" motifs denote motifs from the TRANSFAC database, the JASPAR database and our DME analysis, respectively. For V$ELK1_02, highly similar motifs sampled by DME also reproducibly appeared. Although we present here results based on one training-test set partition, for checking robustness of biological findings, we applied our method to different training-test set partitions. We confirmed that almost the same results were obtained with different training-test set partitions. Secondly, statistical significance was evaluated for each of the sequence features reproducibly selected by the bootstrap analysis. We assessed difference of expression values between two gene groups with and without each sequence feature, using Wilcoxon rank sum test for the training and test data. It should be noted that, because the P-values calculated using the training data is not subject to multiple testing corrections, it can potentially achieve low values by overfitting to the training data. Hence, we must use the P-values calculated using the test data to accurately evaluate statistical significance. The results from the Wilcoxon rank sum tests suggest that sequence features that are most significantly associated with the histological grades are V$ELK1_02(20) V$E2F1_Q4_01(10), V$NRF1_Q6(10) and JSP$NF_Y(10) (The IDs are followed by values of the threshold parameter for motif searches in parentheses). P-values were also calculated for these four sequence features as a combination. We split genes into 16 groups based on combinations of the presence and absence of the 4 sequence feature, and evaluated difference of expression value distributions among the gene groups using Kruskal-Wallis test. Our calculation shows that the combination of these four sequence features scores highly significant a P-value of 1.33 × 10-15 for the test data. Analyses using independent data sets and prediction based on the MAP-value also confirmed these results (see Additional file 1).


Integrative bioinformatics analysis of transcriptional regulatory programs in breast cancer cells.

Niida A, Smith AD, Imoto S, Tsutsumi S, Aburatani H, Zhang MQ, Akiyama T - BMC Bioinformatics (2008)

Sequence features associated with differential expression between G1 and G3 breast tumors.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2572072&req=5

Figure 2: Sequence features associated with differential expression between G1 and G3 breast tumors.
Mentions: For each gene in the global expression profile data, we calculated the degree of differential expression between two sample groups (67 G1 and 54 G3 samples). We then applied our method to the differential expression value to search for correlating motifs. The results were evaluated in two ways. First, reproducibility of the result was assessed by bootstrap analysis. Structure learning of a Bayesian network was repeated 30 times using bootstrap samples from the training dataset. We found that V$ELK1_02, V$E2F1_Q4_01, V$NRF1_Q6 and JSP$NF_Y were reproducibly selected by the bootstrap analysis (Figure 2). Here, IDs starting from "V$", "JSP$" and "DME$" motifs denote motifs from the TRANSFAC database, the JASPAR database and our DME analysis, respectively. For V$ELK1_02, highly similar motifs sampled by DME also reproducibly appeared. Although we present here results based on one training-test set partition, for checking robustness of biological findings, we applied our method to different training-test set partitions. We confirmed that almost the same results were obtained with different training-test set partitions. Secondly, statistical significance was evaluated for each of the sequence features reproducibly selected by the bootstrap analysis. We assessed difference of expression values between two gene groups with and without each sequence feature, using Wilcoxon rank sum test for the training and test data. It should be noted that, because the P-values calculated using the training data is not subject to multiple testing corrections, it can potentially achieve low values by overfitting to the training data. Hence, we must use the P-values calculated using the test data to accurately evaluate statistical significance. The results from the Wilcoxon rank sum tests suggest that sequence features that are most significantly associated with the histological grades are V$ELK1_02(20) V$E2F1_Q4_01(10), V$NRF1_Q6(10) and JSP$NF_Y(10) (The IDs are followed by values of the threshold parameter for motif searches in parentheses). P-values were also calculated for these four sequence features as a combination. We split genes into 16 groups based on combinations of the presence and absence of the 4 sequence feature, and evaluated difference of expression value distributions among the gene groups using Kruskal-Wallis test. Our calculation shows that the combination of these four sequence features scores highly significant a P-value of 1.33 × 10-15 for the test data. Analyses using independent data sets and prediction based on the MAP-value also confirmed these results (see Additional file 1).

Bottom Line: However, compared with the massive knowledge about the transcriptome, we have surprisingly little knowledge about regulatory mechanisms underling transcriptomic diversity.Our analysis found that motifs bound by ELK1, E2F, NRF1 and NFY are potential regulatory motifs that positively correlate with malignant progression of breast cancer.The results suggest that these 4 motifs are principal regulatory motifs driving malignant progression of breast cancer.

View Article: PubMed Central - HTML - PubMed

Affiliation: Laboratory of Molecular and Genetic Information, Institute of Molecular and Cellular Biosciences, The University of Tokyo, Bunkyo-ku, Tokyo, 110-0032, Japan. niida@iam.u-tokyo.ac.jp

ABSTRACT

Background: Microarray technology has unveiled transcriptomic differences among tumors of various phenotypes, and, especially, brought great progress in molecular understanding of phenotypic diversity of breast tumors. However, compared with the massive knowledge about the transcriptome, we have surprisingly little knowledge about regulatory mechanisms underling transcriptomic diversity.

Results: To gain insights into the transcriptional programs that drive tumor progression, we integrated regulatory sequence data and expression profiles of breast cancer into a Bayesian Network, and searched for cis-regulatory motifs statistically associated with given histological grades and prognosis. Our analysis found that motifs bound by ELK1, E2F, NRF1 and NFY are potential regulatory motifs that positively correlate with malignant progression of breast cancer.

Conclusion: The results suggest that these 4 motifs are principal regulatory motifs driving malignant progression of breast cancer. Our method offers a more concise description about transcriptome diversity among breast tumors with different clinical phenotypes.

Show MeSH
Related in: MedlinePlus