Limits...
False Discovery Rates in PET and CT Studies with Texture Features: A Systematic Review.

Chalkidou A, O'Doherty MJ, Marsden PK - PLoS ONE (2015)

Bottom Line: After applying appropriate statistical corrections, an average type-I error probability of 76% (range: 34-99%) was estimated with the majority of published results not reaching statistical significance.For the 100 random variables examined, 10% proved to be significant predictors of survival when subjected to ROC and multiple hypothesis testing analysis.We found insufficient evidence to support a relationship between PET or CT texture features and patient survival.

View Article: PubMed Central - PubMed

Affiliation: Division of Imaging Sciences and Biomedical Engineering, Kings College London 4th Floor, Lambeth Wing, St. Thomas Hospital, SE1 7EH, London, United Kingdom.

ABSTRACT

Purpose: A number of recent publications have proposed that a family of image-derived indices, called texture features, can predict clinical outcome in patients with cancer. However, the investigation of multiple indices on a single data set can lead to significant inflation of type-I errors. We report a systematic review of the type-I error inflation in such studies and review the evidence regarding associations between patient outcome and texture features derived from positron emission tomography (PET) or computed tomography (CT) images.

Methods: For study identification PubMed and Scopus were searched (1/2000-9/2013) using combinations of the keywords texture, prognostic, predictive and cancer. Studies were divided into three categories according to the sources of the type-I error inflation and the use or not of an independent validation dataset. For each study, the true type-I error probability and the adjusted level of significance were estimated using the optimum cut-off approach correction, and the Benjamini-Hochberg method. To demonstrate explicitly the variable selection bias in these studies, we re-analyzed data from one of the published studies, but using 100 random variables substituted for the original image-derived indices. The significance of the random variables as potential predictors of outcome was examined using the analysis methods used in the identified studies.

Results: Fifteen studies were identified. After applying appropriate statistical corrections, an average type-I error probability of 76% (range: 34-99%) was estimated with the majority of published results not reaching statistical significance. Only 3/15 studies used a validation dataset. For the 100 random variables examined, 10% proved to be significant predictors of survival when subjected to ROC and multiple hypothesis testing analysis.

Conclusions: We found insufficient evidence to support a relationship between PET or CT texture features and patient survival. Further fit for purpose validation of these image-derived biomarkers should be supported by appropriate biological and statistical evidence before their association with patient outcome is investigated in prospective studies.

No MeSH data available.


Related in: MedlinePlus

Statistical significance of Kaplan-Meier analysis for 100 random variables using the optimum cut-off approach.The variables are ordered by increasing p-values. Overall 10% of the random variables are statistically significant predictors of survival.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4418696&req=5

pone.0124165.g005: Statistical significance of Kaplan-Meier analysis for 100 random variables using the optimum cut-off approach.The variables are ordered by increasing p-values. Overall 10% of the random variables are statistically significant predictors of survival.

Mentions: The minimum and maximum AUC achieved with the random variables were 0.213 and 0.796, respectively (Fig 4). In comparison with the texture features investigated in the studies retrieved from the systematic review, the random variable analysis achieved higher AUCs than uniformity in [28,30,32,34], energy in [31], or busyness in [29]. Despite there being no real relationships between the 100 random variables and survival, using the methodology typically employed in the published studies, in 10% of the variables the choice of an optimum cut-off appeared to have prognostic power in Kaplan Meier survival analysis (Fig 5). The AUC values for these random variables with prognostic power are reported in Table 3.


False Discovery Rates in PET and CT Studies with Texture Features: A Systematic Review.

Chalkidou A, O'Doherty MJ, Marsden PK - PLoS ONE (2015)

Statistical significance of Kaplan-Meier analysis for 100 random variables using the optimum cut-off approach.The variables are ordered by increasing p-values. Overall 10% of the random variables are statistically significant predictors of survival.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4418696&req=5

pone.0124165.g005: Statistical significance of Kaplan-Meier analysis for 100 random variables using the optimum cut-off approach.The variables are ordered by increasing p-values. Overall 10% of the random variables are statistically significant predictors of survival.
Mentions: The minimum and maximum AUC achieved with the random variables were 0.213 and 0.796, respectively (Fig 4). In comparison with the texture features investigated in the studies retrieved from the systematic review, the random variable analysis achieved higher AUCs than uniformity in [28,30,32,34], energy in [31], or busyness in [29]. Despite there being no real relationships between the 100 random variables and survival, using the methodology typically employed in the published studies, in 10% of the variables the choice of an optimum cut-off appeared to have prognostic power in Kaplan Meier survival analysis (Fig 5). The AUC values for these random variables with prognostic power are reported in Table 3.

Bottom Line: After applying appropriate statistical corrections, an average type-I error probability of 76% (range: 34-99%) was estimated with the majority of published results not reaching statistical significance.For the 100 random variables examined, 10% proved to be significant predictors of survival when subjected to ROC and multiple hypothesis testing analysis.We found insufficient evidence to support a relationship between PET or CT texture features and patient survival.

View Article: PubMed Central - PubMed

Affiliation: Division of Imaging Sciences and Biomedical Engineering, Kings College London 4th Floor, Lambeth Wing, St. Thomas Hospital, SE1 7EH, London, United Kingdom.

ABSTRACT

Purpose: A number of recent publications have proposed that a family of image-derived indices, called texture features, can predict clinical outcome in patients with cancer. However, the investigation of multiple indices on a single data set can lead to significant inflation of type-I errors. We report a systematic review of the type-I error inflation in such studies and review the evidence regarding associations between patient outcome and texture features derived from positron emission tomography (PET) or computed tomography (CT) images.

Methods: For study identification PubMed and Scopus were searched (1/2000-9/2013) using combinations of the keywords texture, prognostic, predictive and cancer. Studies were divided into three categories according to the sources of the type-I error inflation and the use or not of an independent validation dataset. For each study, the true type-I error probability and the adjusted level of significance were estimated using the optimum cut-off approach correction, and the Benjamini-Hochberg method. To demonstrate explicitly the variable selection bias in these studies, we re-analyzed data from one of the published studies, but using 100 random variables substituted for the original image-derived indices. The significance of the random variables as potential predictors of outcome was examined using the analysis methods used in the identified studies.

Results: Fifteen studies were identified. After applying appropriate statistical corrections, an average type-I error probability of 76% (range: 34-99%) was estimated with the majority of published results not reaching statistical significance. Only 3/15 studies used a validation dataset. For the 100 random variables examined, 10% proved to be significant predictors of survival when subjected to ROC and multiple hypothesis testing analysis.

Conclusions: We found insufficient evidence to support a relationship between PET or CT texture features and patient survival. Further fit for purpose validation of these image-derived biomarkers should be supported by appropriate biological and statistical evidence before their association with patient outcome is investigated in prospective studies.

No MeSH data available.


Related in: MedlinePlus