Limits...
Not proper ROC curves as new tool for the analysis of differentially expressed genes in microarray experiments.

Parodi S, Pistoia V, Muselli M - BMC Bioinformatics (2008)

Bottom Line: Among them, 16 corresponded to NPRC and all escaped standard selection procedures based on AUC and t statistics.Moreover, a simple inspection to the shape of such plots allowed to identify the two subclasses in either one class in 13 cases (81%).NPRC represent a new useful tool for the analysis of microarray data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Epidemiology and Biostatistics Section, Scientific Directorate, G. Gaslini Children's Hospital, Genoa, Italy. stefanoparodi@ospedale-gaslini.ge.it

ABSTRACT

Unlabelled: Most microarray experiments are carried out with the purpose of identifying genes whose expression varies in relation with specific conditions or in response to environmental stimuli. In such studies, genes showing similar mean expression values between two or more groups are considered as not differentially expressed, even if hidden subclasses with different expression values may exist. In this paper we propose a new method for identifying differentially expressed genes, based on the area between the ROC curve and the rising diagonal (ABCR). ABCR represents a more general approach than the standard area under the ROC curve (AUC), because it can identify both proper (i.e., concave) and not proper ROC curves (NPRC). In particular, NPRC may correspond to those genes that tend to escape standard selection methods.

Results: We assessed the performance of our method using data from a publicly available database of 4026 genes, including 14 normal B cell samples (NBC) and 20 heterogeneous lymphomas (namely: 9 follicular lymphomas and 11 chronic lymphocytic leukemias). Moreover, NBC also included two sub-classes, i.e., 6 heavily stimulated and 8 slightly or not stimulated samples. We identified 1607 differentially expressed genes with an estimated False Discovery Rate of 15%. Among them, 16 corresponded to NPRC and all escaped standard selection procedures based on AUC and t statistics. Moreover, a simple inspection to the shape of such plots allowed to identify the two subclasses in either one class in 13 cases (81%).

Conclusion: NPRC represent a new useful tool for the analysis of microarray data.

Show MeSH

Related in: MedlinePlus

Not proper ROC curve corresponding to the expression of gene n. 6 in Table 2 (GENE75X: VRK2 kinase). Comparison between class A (14 samples of NBC) and class B (20 heterogeneous lymphomas, including 9 FL and 11 CLL samples). Hst = Highly stimulated NBC; SSt= Slightly or not stimulated NBC (Table 1). NBC samples are numbered according to Table 1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2576270&req=5

Figure 8: Not proper ROC curve corresponding to the expression of gene n. 6 in Table 2 (GENE75X: VRK2 kinase). Comparison between class A (14 samples of NBC) and class B (20 heterogeneous lymphomas, including 9 FL and 11 CLL samples). Hst = Highly stimulated NBC; SSt= Slightly or not stimulated NBC (Table 1). NBC samples are numbered according to Table 1.

Mentions: Because the main sources of heterogeneity were known a priori for both class A (NBC), which included differently stimulated cells, and for class B, which included samples from two different malignant diseases (namely, FL and CLL), we carried out a detailed analysis of each ROC curve obtained from the expression values of genes listed in Table 2 (Figures 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17 and 18). Figures from 3 to 18 were ordered according to the ranks of the corresponding genes in Table 2, i.e., Figure 3 refers to the expression of gene n. 1, Figure 4 corresponds to gene n. 2, and so on. Each plot reports both the origin of samples in class B (i.e., either FL or CLL) and the two major subclasses within NBC class, according to Table 1 (i.e., heavily stimulated and slightly or not stimulated cells). Moreover, each plot was arbitrarily split into two parts to roughly separate samples with high (left side) and with low (right side) expression level. Finally, the ROC curves in Figures 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17. 18 were classified as "sigmoid-shaped" (like Curve III in Figure 1) and "inversely sigmoid-shaped" (like Curve IV in Figure 1).


Not proper ROC curves as new tool for the analysis of differentially expressed genes in microarray experiments.

Parodi S, Pistoia V, Muselli M - BMC Bioinformatics (2008)

Not proper ROC curve corresponding to the expression of gene n. 6 in Table 2 (GENE75X: VRK2 kinase). Comparison between class A (14 samples of NBC) and class B (20 heterogeneous lymphomas, including 9 FL and 11 CLL samples). Hst = Highly stimulated NBC; SSt= Slightly or not stimulated NBC (Table 1). NBC samples are numbered according to Table 1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2576270&req=5

Figure 8: Not proper ROC curve corresponding to the expression of gene n. 6 in Table 2 (GENE75X: VRK2 kinase). Comparison between class A (14 samples of NBC) and class B (20 heterogeneous lymphomas, including 9 FL and 11 CLL samples). Hst = Highly stimulated NBC; SSt= Slightly or not stimulated NBC (Table 1). NBC samples are numbered according to Table 1.
Mentions: Because the main sources of heterogeneity were known a priori for both class A (NBC), which included differently stimulated cells, and for class B, which included samples from two different malignant diseases (namely, FL and CLL), we carried out a detailed analysis of each ROC curve obtained from the expression values of genes listed in Table 2 (Figures 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17 and 18). Figures from 3 to 18 were ordered according to the ranks of the corresponding genes in Table 2, i.e., Figure 3 refers to the expression of gene n. 1, Figure 4 corresponds to gene n. 2, and so on. Each plot reports both the origin of samples in class B (i.e., either FL or CLL) and the two major subclasses within NBC class, according to Table 1 (i.e., heavily stimulated and slightly or not stimulated cells). Moreover, each plot was arbitrarily split into two parts to roughly separate samples with high (left side) and with low (right side) expression level. Finally, the ROC curves in Figures 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17. 18 were classified as "sigmoid-shaped" (like Curve III in Figure 1) and "inversely sigmoid-shaped" (like Curve IV in Figure 1).

Bottom Line: Among them, 16 corresponded to NPRC and all escaped standard selection procedures based on AUC and t statistics.Moreover, a simple inspection to the shape of such plots allowed to identify the two subclasses in either one class in 13 cases (81%).NPRC represent a new useful tool for the analysis of microarray data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Epidemiology and Biostatistics Section, Scientific Directorate, G. Gaslini Children's Hospital, Genoa, Italy. stefanoparodi@ospedale-gaslini.ge.it

ABSTRACT

Unlabelled: Most microarray experiments are carried out with the purpose of identifying genes whose expression varies in relation with specific conditions or in response to environmental stimuli. In such studies, genes showing similar mean expression values between two or more groups are considered as not differentially expressed, even if hidden subclasses with different expression values may exist. In this paper we propose a new method for identifying differentially expressed genes, based on the area between the ROC curve and the rising diagonal (ABCR). ABCR represents a more general approach than the standard area under the ROC curve (AUC), because it can identify both proper (i.e., concave) and not proper ROC curves (NPRC). In particular, NPRC may correspond to those genes that tend to escape standard selection methods.

Results: We assessed the performance of our method using data from a publicly available database of 4026 genes, including 14 normal B cell samples (NBC) and 20 heterogeneous lymphomas (namely: 9 follicular lymphomas and 11 chronic lymphocytic leukemias). Moreover, NBC also included two sub-classes, i.e., 6 heavily stimulated and 8 slightly or not stimulated samples. We identified 1607 differentially expressed genes with an estimated False Discovery Rate of 15%. Among them, 16 corresponded to NPRC and all escaped standard selection procedures based on AUC and t statistics. Moreover, a simple inspection to the shape of such plots allowed to identify the two subclasses in either one class in 13 cases (81%).

Conclusion: NPRC represent a new useful tool for the analysis of microarray data.

Show MeSH
Related in: MedlinePlus