Limits...
Penalized Ordinal Regression Methods for Predicting Stage of Cancer in High-Dimensional Covariate Spaces.

Gentry AE, Jackson-Cook CK, Lyon DE, Archer KJ - Cancer Inform (2015)

Bottom Line: Currently, statistical methods for predicting an ordinal outcome using clinical, demographic, and high-dimensional correlated features are lacking.We demonstrate the application of our method to predict the stage of breast cancer.The method has been made available in the ordinalgmifs package in the R programming environment.

View Article: PubMed Central - PubMed

Affiliation: Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA.

ABSTRACT
The pathological description of the stage of a tumor is an important clinical designation and is considered, like many other forms of biomedical data, an ordinal outcome. Currently, statistical methods for predicting an ordinal outcome using clinical, demographic, and high-dimensional correlated features are lacking. In this paper, we propose a method that fits an ordinal response model to predict an ordinal outcome for high-dimensional covariate spaces. Our method penalizes some covariates (high-throughput genomic features) without penalizing others (such as demographic and/or clinical covariates). We demonstrate the application of our method to predict the stage of breast cancer. In our model, breast cancer subtype is a nonpenalized predictor, and CpG site methylation values from the Illumina Human Methylation 450K assay are penalized predictors. The method has been made available in the ordinalgmifs package in the R programming environment.

No MeSH data available.


Related in: MedlinePlus

Boxplot of mean β-values by percent GC content across all samples, for type II probes.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4447150&req=5

f2-cin-suppl.2-2015-201: Boxplot of mean β-values by percent GC content across all samples, for type II probes.

Mentions: Some preprocessing of the methylation data was necessary prior to statistical analysis. Our first preprocessing step was to look at the distribution of β-values by the GC content. This is important because previous research has established that methylation may not be accurately measured in regions of high GC content.12 Illumina’s design for the 450K array includes two separate assays, Type I and Type II, for estimating methylation at a given locus. GC content was calculated as the proportion of the probe sequence comprised of C’s and G’s and reported separately for Type I and Type II design types. We then examined the boxplots of average β-values (across all samples) by the GC content for each of the assay types separately (Figs. 1 and 2). The resulting boxplots were used to determine a GC proportion cutoff value beyond which methylation seems to no longer be reliably measured. The choice of such a cutoff is clearly subjective, but it is important to remove the CpG sites beyond the cutoff because inclusion of unreliable probes may distort the analysis. We also removed CpG sites within which there were known single nucleotide polymorphisms (SNPs) according to the Illumina-provided annotation files.11


Penalized Ordinal Regression Methods for Predicting Stage of Cancer in High-Dimensional Covariate Spaces.

Gentry AE, Jackson-Cook CK, Lyon DE, Archer KJ - Cancer Inform (2015)

Boxplot of mean β-values by percent GC content across all samples, for type II probes.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4447150&req=5

f2-cin-suppl.2-2015-201: Boxplot of mean β-values by percent GC content across all samples, for type II probes.
Mentions: Some preprocessing of the methylation data was necessary prior to statistical analysis. Our first preprocessing step was to look at the distribution of β-values by the GC content. This is important because previous research has established that methylation may not be accurately measured in regions of high GC content.12 Illumina’s design for the 450K array includes two separate assays, Type I and Type II, for estimating methylation at a given locus. GC content was calculated as the proportion of the probe sequence comprised of C’s and G’s and reported separately for Type I and Type II design types. We then examined the boxplots of average β-values (across all samples) by the GC content for each of the assay types separately (Figs. 1 and 2). The resulting boxplots were used to determine a GC proportion cutoff value beyond which methylation seems to no longer be reliably measured. The choice of such a cutoff is clearly subjective, but it is important to remove the CpG sites beyond the cutoff because inclusion of unreliable probes may distort the analysis. We also removed CpG sites within which there were known single nucleotide polymorphisms (SNPs) according to the Illumina-provided annotation files.11

Bottom Line: Currently, statistical methods for predicting an ordinal outcome using clinical, demographic, and high-dimensional correlated features are lacking.We demonstrate the application of our method to predict the stage of breast cancer.The method has been made available in the ordinalgmifs package in the R programming environment.

View Article: PubMed Central - PubMed

Affiliation: Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA.

ABSTRACT
The pathological description of the stage of a tumor is an important clinical designation and is considered, like many other forms of biomedical data, an ordinal outcome. Currently, statistical methods for predicting an ordinal outcome using clinical, demographic, and high-dimensional correlated features are lacking. In this paper, we propose a method that fits an ordinal response model to predict an ordinal outcome for high-dimensional covariate spaces. Our method penalizes some covariates (high-throughput genomic features) without penalizing others (such as demographic and/or clinical covariates). We demonstrate the application of our method to predict the stage of breast cancer. In our model, breast cancer subtype is a nonpenalized predictor, and CpG site methylation values from the Illumina Human Methylation 450K assay are penalized predictors. The method has been made available in the ordinalgmifs package in the R programming environment.

No MeSH data available.


Related in: MedlinePlus