Limits...
Gene Selection using a High-Dimensional Regression Model with Microarrays in Cancer Prognostic Studies.

Kaneko S, Hirakawa A, Hamada C - Cancer Inform (2012)

Bottom Line: We propose a method for estimating the false positive rate (FPR) for lasso estimates in a high-dimensional Cox model.We performed a simulation study to examine the precision of the FPR estimate by the proposed method.We applied the proposed method to real data and illustrated the identification of false positive genes.

View Article: PubMed Central - PubMed

Affiliation: Department of Management Science, Graduate School of Engineering, Tokyo University of Science, 1-3 Kagurazaka, Shinjuku-ku, Tokyo 162-8601, Japan.

ABSTRACT
Mining of gene expression data to identify genes associated with patient survival is an ongoing problem in cancer prognostic studies using microarrays in order to use such genes to achieve more accurate prognoses. The least absolute shrinkage and selection operator (lasso) is often used for gene selection and parameter estimation in high-dimensional microarray data. The lasso shrinks some of the coefficients to zero, and the amount of shrinkage is determined by the tuning parameter, often determined by cross validation. The model determined by this cross validation contains many false positives whose coefficients are actually zero. We propose a method for estimating the false positive rate (FPR) for lasso estimates in a high-dimensional Cox model. We performed a simulation study to examine the precision of the FPR estimate by the proposed method. We applied the proposed method to real data and illustrated the identification of false positive genes.

No MeSH data available.


Related in: MedlinePlus

The estimated mixture distribution assuming the lasso estimates in the DLBCL data; fL and fN are the probability density functions of laplace and normal distributions, respectively. β̂ is the estimate by the lasso and f(β̂) is the probability density of β̂.Note: A magnified image of the distribution between the β̂ values −0.3 and 0.1 is inserted.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3298378&req=5

f1-cin-11-2012-029: The estimated mixture distribution assuming the lasso estimates in the DLBCL data; fL and fN are the probability density functions of laplace and normal distributions, respectively. β̂ is the estimate by the lasso and f(β̂) is the probability density of β̂.Note: A magnified image of the distribution between the β̂ values −0.3 and 0.1 is inserted.

Mentions: Given the estimated coefficients β̂j(j = 1, …, 7399), we assume that the 2 mixture distributions with C = 1 and 2, and compared their fitness by using Akaike Information Criterion (AIC).19 AIC is the most well known criterion for determining the number of components in the models. As a result, we selected the value of C = 1 for simplicity of interpretation, although the AICs for C = 1 and 2 were almost same. Thus, we assumed the mixture distribution with C = 1, and obtained the following estimated distribution (Fig. 1):(11)f(β⌢j)=1607399{0.75 fL(β⌢j;0,0.0053)+0.25 fN(β⌢j;−0.10,0.0064)}+72397399fL(β⌢j;0,10−18)


Gene Selection using a High-Dimensional Regression Model with Microarrays in Cancer Prognostic Studies.

Kaneko S, Hirakawa A, Hamada C - Cancer Inform (2012)

The estimated mixture distribution assuming the lasso estimates in the DLBCL data; fL and fN are the probability density functions of laplace and normal distributions, respectively. β̂ is the estimate by the lasso and f(β̂) is the probability density of β̂.Note: A magnified image of the distribution between the β̂ values −0.3 and 0.1 is inserted.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3298378&req=5

f1-cin-11-2012-029: The estimated mixture distribution assuming the lasso estimates in the DLBCL data; fL and fN are the probability density functions of laplace and normal distributions, respectively. β̂ is the estimate by the lasso and f(β̂) is the probability density of β̂.Note: A magnified image of the distribution between the β̂ values −0.3 and 0.1 is inserted.
Mentions: Given the estimated coefficients β̂j(j = 1, …, 7399), we assume that the 2 mixture distributions with C = 1 and 2, and compared their fitness by using Akaike Information Criterion (AIC).19 AIC is the most well known criterion for determining the number of components in the models. As a result, we selected the value of C = 1 for simplicity of interpretation, although the AICs for C = 1 and 2 were almost same. Thus, we assumed the mixture distribution with C = 1, and obtained the following estimated distribution (Fig. 1):(11)f(β⌢j)=1607399{0.75 fL(β⌢j;0,0.0053)+0.25 fN(β⌢j;−0.10,0.0064)}+72397399fL(β⌢j;0,10−18)

Bottom Line: We propose a method for estimating the false positive rate (FPR) for lasso estimates in a high-dimensional Cox model.We performed a simulation study to examine the precision of the FPR estimate by the proposed method.We applied the proposed method to real data and illustrated the identification of false positive genes.

View Article: PubMed Central - PubMed

Affiliation: Department of Management Science, Graduate School of Engineering, Tokyo University of Science, 1-3 Kagurazaka, Shinjuku-ku, Tokyo 162-8601, Japan.

ABSTRACT
Mining of gene expression data to identify genes associated with patient survival is an ongoing problem in cancer prognostic studies using microarrays in order to use such genes to achieve more accurate prognoses. The least absolute shrinkage and selection operator (lasso) is often used for gene selection and parameter estimation in high-dimensional microarray data. The lasso shrinks some of the coefficients to zero, and the amount of shrinkage is determined by the tuning parameter, often determined by cross validation. The model determined by this cross validation contains many false positives whose coefficients are actually zero. We propose a method for estimating the false positive rate (FPR) for lasso estimates in a high-dimensional Cox model. We performed a simulation study to examine the precision of the FPR estimate by the proposed method. We applied the proposed method to real data and illustrated the identification of false positive genes.

No MeSH data available.


Related in: MedlinePlus