Limits...
The Influence of the Global Gene Expression Shift on Downstream Analyses.

Xu Q, Zhang X - PLoS ONE (2016)

Bottom Line: Most existing gene expression data were generated without considering this possibility, and are therefore at the risk of having produced unreliable results if such global shift effect exists in the data.To evaluate this risk, we conducted a systematic study on the possible influence of the global gene expression shift effect on differential expression analysis and on molecular classification analysis.Classification accuracy is not sensitive to the shift and actually can benefit from it, but genes selected for the classification can be greatly affected.

View Article: PubMed Central - PubMed

Affiliation: MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing, China.

ABSTRACT
The assumption that total abundance of RNAs in a cell is roughly the same in different cells is underlying most studies based on gene expression analyses. But experiments have shown that changes in the expression of some master regulators such as c-MYC can cause global shift in the expression of almost all genes in some cell types like cancers. Such shift will violate this assumption and can cause wrong or biased conclusions for standard data analysis practices, such as detection of differentially expressed (DE) genes and molecular classification of tumors based on gene expression. Most existing gene expression data were generated without considering this possibility, and are therefore at the risk of having produced unreliable results if such global shift effect exists in the data. To evaluate this risk, we conducted a systematic study on the possible influence of the global gene expression shift effect on differential expression analysis and on molecular classification analysis. We collected data with known global shift effect and also generated data to simulate different situations of the effect based on a wide collection of real gene expression data, and conducted comparative studies on representative existing methods. We observed that some DE analysis methods are more tolerant to the global shift while others are very sensitive to it. Classification accuracy is not sensitive to the shift and actually can benefit from it, but genes selected for the classification can be greatly affected.

No MeSH data available.


Related in: MedlinePlus

The flowchart of the classification and gene selection experiments on data with simulated global shift.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4836657&req=5

pone.0153903.g001: The flowchart of the classification and gene selection experiments on data with simulated global shift.

Mentions: The stringent cross-validation scheme (CV2 as defined in [27]) is used to estimate the error rate of R-SVM at each level of feature selection. The samples to be tested in the validation step were left out at the beginning before any feature selection step. This avoids the possible over-fitting caused in feature selection caused by “information leak” due to the improper timing of cross-validation [27]. The detail of the method was described in [27]. We used R-SVM with C = 10 in the leave-one-out cross-validation and set the number of features to decrease by 50% at each level of feature selection along the ladder. We applied R-SVM on the original expression datasets and their sister data with simulated global expression shift, and compared the classification errors and the selected gene lists on each pair of datasets. Fig 1 shows the experiment diagram.


The Influence of the Global Gene Expression Shift on Downstream Analyses.

Xu Q, Zhang X - PLoS ONE (2016)

The flowchart of the classification and gene selection experiments on data with simulated global shift.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4836657&req=5

pone.0153903.g001: The flowchart of the classification and gene selection experiments on data with simulated global shift.
Mentions: The stringent cross-validation scheme (CV2 as defined in [27]) is used to estimate the error rate of R-SVM at each level of feature selection. The samples to be tested in the validation step were left out at the beginning before any feature selection step. This avoids the possible over-fitting caused in feature selection caused by “information leak” due to the improper timing of cross-validation [27]. The detail of the method was described in [27]. We used R-SVM with C = 10 in the leave-one-out cross-validation and set the number of features to decrease by 50% at each level of feature selection along the ladder. We applied R-SVM on the original expression datasets and their sister data with simulated global expression shift, and compared the classification errors and the selected gene lists on each pair of datasets. Fig 1 shows the experiment diagram.

Bottom Line: Most existing gene expression data were generated without considering this possibility, and are therefore at the risk of having produced unreliable results if such global shift effect exists in the data.To evaluate this risk, we conducted a systematic study on the possible influence of the global gene expression shift effect on differential expression analysis and on molecular classification analysis.Classification accuracy is not sensitive to the shift and actually can benefit from it, but genes selected for the classification can be greatly affected.

View Article: PubMed Central - PubMed

Affiliation: MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing, China.

ABSTRACT
The assumption that total abundance of RNAs in a cell is roughly the same in different cells is underlying most studies based on gene expression analyses. But experiments have shown that changes in the expression of some master regulators such as c-MYC can cause global shift in the expression of almost all genes in some cell types like cancers. Such shift will violate this assumption and can cause wrong or biased conclusions for standard data analysis practices, such as detection of differentially expressed (DE) genes and molecular classification of tumors based on gene expression. Most existing gene expression data were generated without considering this possibility, and are therefore at the risk of having produced unreliable results if such global shift effect exists in the data. To evaluate this risk, we conducted a systematic study on the possible influence of the global gene expression shift effect on differential expression analysis and on molecular classification analysis. We collected data with known global shift effect and also generated data to simulate different situations of the effect based on a wide collection of real gene expression data, and conducted comparative studies on representative existing methods. We observed that some DE analysis methods are more tolerant to the global shift while others are very sensitive to it. Classification accuracy is not sensitive to the shift and actually can benefit from it, but genes selected for the classification can be greatly affected.

No MeSH data available.


Related in: MedlinePlus