Limits...
The Influence of the Global Gene Expression Shift on Downstream Analyses.

Xu Q, Zhang X - PLoS ONE (2016)

Bottom Line: Most existing gene expression data were generated without considering this possibility, and are therefore at the risk of having produced unreliable results if such global shift effect exists in the data.To evaluate this risk, we conducted a systematic study on the possible influence of the global gene expression shift effect on differential expression analysis and on molecular classification analysis.Classification accuracy is not sensitive to the shift and actually can benefit from it, but genes selected for the classification can be greatly affected.

View Article: PubMed Central - PubMed

Affiliation: MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing, China.

ABSTRACT
The assumption that total abundance of RNAs in a cell is roughly the same in different cells is underlying most studies based on gene expression analyses. But experiments have shown that changes in the expression of some master regulators such as c-MYC can cause global shift in the expression of almost all genes in some cell types like cancers. Such shift will violate this assumption and can cause wrong or biased conclusions for standard data analysis practices, such as detection of differentially expressed (DE) genes and molecular classification of tumors based on gene expression. Most existing gene expression data were generated without considering this possibility, and are therefore at the risk of having produced unreliable results if such global shift effect exists in the data. To evaluate this risk, we conducted a systematic study on the possible influence of the global gene expression shift effect on differential expression analysis and on molecular classification analysis. We collected data with known global shift effect and also generated data to simulate different situations of the effect based on a wide collection of real gene expression data, and conducted comparative studies on representative existing methods. We observed that some DE analysis methods are more tolerant to the global shift while others are very sensitive to it. Classification accuracy is not sensitive to the shift and actually can benefit from it, but genes selected for the classification can be greatly affected.

No MeSH data available.


Related in: MedlinePlus

The overlap proportion of selected gene lists by R-SVM.(A) on Dataset 1; (B) on Dataset 2. The settings are the same with Fig 2.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4836657&req=5

pone.0153903.g005: The overlap proportion of selected gene lists by R-SVM.(A) on Dataset 1; (B) on Dataset 2. The settings are the same with Fig 2.

Mentions: We applied R-SVM and SVM-RFE on the sister data of the 20 datasets for the classification of the two groups and selection of informative genes for the classification. The results of two methods are similar, so we only present the results of R-SVM here. We compared the leave-one-out cross-validation errors on the sister datasets and the overlap between the two gene lists selected on the sister datasets. Tables 6 and 7 give the error rates at different gene-selection levels on Dataset 1 and Dataset 2. Fig 5 shows the overlap of selected genes in the sister datasets at different gene-selection levels. Results on the other datasets are provided in the S2 File. Without surprise, we can see the classification error becomes smaller (0 for most of the data in our experiments) when there is global shift in one of the two groups, since the global shift brings systematic difference in gene expression between the two groups and makes them more separable. However, the overlap between the genes selected from sister datasets is low, especially when we select only a small number of genes.


The Influence of the Global Gene Expression Shift on Downstream Analyses.

Xu Q, Zhang X - PLoS ONE (2016)

The overlap proportion of selected gene lists by R-SVM.(A) on Dataset 1; (B) on Dataset 2. The settings are the same with Fig 2.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4836657&req=5

pone.0153903.g005: The overlap proportion of selected gene lists by R-SVM.(A) on Dataset 1; (B) on Dataset 2. The settings are the same with Fig 2.
Mentions: We applied R-SVM and SVM-RFE on the sister data of the 20 datasets for the classification of the two groups and selection of informative genes for the classification. The results of two methods are similar, so we only present the results of R-SVM here. We compared the leave-one-out cross-validation errors on the sister datasets and the overlap between the two gene lists selected on the sister datasets. Tables 6 and 7 give the error rates at different gene-selection levels on Dataset 1 and Dataset 2. Fig 5 shows the overlap of selected genes in the sister datasets at different gene-selection levels. Results on the other datasets are provided in the S2 File. Without surprise, we can see the classification error becomes smaller (0 for most of the data in our experiments) when there is global shift in one of the two groups, since the global shift brings systematic difference in gene expression between the two groups and makes them more separable. However, the overlap between the genes selected from sister datasets is low, especially when we select only a small number of genes.

Bottom Line: Most existing gene expression data were generated without considering this possibility, and are therefore at the risk of having produced unreliable results if such global shift effect exists in the data.To evaluate this risk, we conducted a systematic study on the possible influence of the global gene expression shift effect on differential expression analysis and on molecular classification analysis.Classification accuracy is not sensitive to the shift and actually can benefit from it, but genes selected for the classification can be greatly affected.

View Article: PubMed Central - PubMed

Affiliation: MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing, China.

ABSTRACT
The assumption that total abundance of RNAs in a cell is roughly the same in different cells is underlying most studies based on gene expression analyses. But experiments have shown that changes in the expression of some master regulators such as c-MYC can cause global shift in the expression of almost all genes in some cell types like cancers. Such shift will violate this assumption and can cause wrong or biased conclusions for standard data analysis practices, such as detection of differentially expressed (DE) genes and molecular classification of tumors based on gene expression. Most existing gene expression data were generated without considering this possibility, and are therefore at the risk of having produced unreliable results if such global shift effect exists in the data. To evaluate this risk, we conducted a systematic study on the possible influence of the global gene expression shift effect on differential expression analysis and on molecular classification analysis. We collected data with known global shift effect and also generated data to simulate different situations of the effect based on a wide collection of real gene expression data, and conducted comparative studies on representative existing methods. We observed that some DE analysis methods are more tolerant to the global shift while others are very sensitive to it. Classification accuracy is not sensitive to the shift and actually can benefit from it, but genes selected for the classification can be greatly affected.

No MeSH data available.


Related in: MedlinePlus