Limits...
Familial and sporadic idiopathic pulmonary fibrosis: making the diagnosis from peripheral blood.

Meltzer EB, Barry WT, Yang IV, Brown KK, Schwarz MI, Patel H, Ashley A, Noble PW, Schwartz DA, Steele MP - BMC Genomics (2014)

Bottom Line: Unsupervised clustering failed to discriminate between samples of different severity.The signature was successfully applied to the validation set, ROC area under the curve = 0.893, p < 0.0001.By using Bayesian probit regression to develop a model, we show that it is entirely possible to make a diagnosis of IPF from the peripheral blood with gene signatures.

View Article: PubMed Central - PubMed

Affiliation: Division of Allergy, Pulmonary, and Critical Care, Vanderbilt University Medical Center, 1313 21st Avenue South, 1105 Oxford House, Nashville, TN, USA. mark.p.steele@vanderbilt.edu.

ABSTRACT

Background: Peripheral blood biomarkers might improve diagnostic accuracy for idiopathic pulmonary fibrosis (IPF).

Results: Gene expression profiles were obtained from 89 patients with IPF and 26 normal controls. Samples were stratified according to severity of disease based on pulmonary function. The stratified dataset was split into subsets; two-thirds of the samples were selected to comprise the training set, while one-third was reserved for the validation set. Bayesian probit regression was used on the training set to develop a gene expression model for IPF versus normal. The gene expression model was tested by using it on the validation set to perform class prediction. Unsupervised clustering failed to discriminate between samples of different severity. Therefore, samples of all severities were included in the training and validation sets, in equal proportions. A gene signature model was developed from the training set. The model was built in an iterative fashion with the number of gene features selected to minimize the misclassification error in cross validation. The final model was based on the top 108 discriminating genes in the training set. The signature was successfully applied to the validation set, ROC area under the curve = 0.893, p < 0.0001. Using the optimal threshold (0.74) accurate class predictions were made for 77% of the test cases with sensitivity = 0.70, specificity = 1.00.

Conclusions: By using Bayesian probit regression to develop a model, we show that it is entirely possible to make a diagnosis of IPF from the peripheral blood with gene signatures.

Show MeSH

Related in: MedlinePlus

Selection of features for the model. Leave-one-out cross validation (LOOCV) is performed on all possible gene signatures, ranging from 50–250 features. Then, the performance characteristics of this bootstrap test are used to select an optimal number of genes (features) with which to build the signature. (A) Maximum area under the curve is achieved with signatures containing 105, 107, 108, 109 and 111 features. (B) The sum of deviance of the predicted probabilities is minimized by selecting the signature that contains 108 features.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4288625&req=5

Fig4: Selection of features for the model. Leave-one-out cross validation (LOOCV) is performed on all possible gene signatures, ranging from 50–250 features. Then, the performance characteristics of this bootstrap test are used to select an optimal number of genes (features) with which to build the signature. (A) Maximum area under the curve is achieved with signatures containing 105, 107, 108, 109 and 111 features. (B) The sum of deviance of the predicted probabilities is minimized by selecting the signature that contains 108 features.

Mentions: In developing a Bayesian Probit Regression model for IPF versus normal, one of the first steps is to select the optimal number of features (genes) to include in the functional gene signature. This was accomplished through an iterative data-driven process, whereby consecutive models were constructed, through a range of features from 50–250 genes (the practical limits of computational power). For each consecutive model, internal validity was measured with leave-one-out cross validation (LOOCV) and two parameters were examined: (a) the rate of phenotype misclassifications, calculated by measuring the area under the receiver operating characteristic curve (ROC statistic); and (b) the sum of deviance (SOD), an aggregate of deviances between the predicted posterior probability and the expected posterior probability of the true phenotype for each sample. The ROC statistic identified five potential models (with maximal performance on the LOOVC test): functional gene signatures containing 105, 107, 108, 109 and 111 features all attained ROC statistic = 0.814. Among these potential signatures, the optimal functional gene signature was chosen by examining the SOD. The functional gene signature with 108 gene features had the least SOD = 21.461; signatures containing 107 and 109 genes were close, with SOD = 21.469 and SOD = 21.467 respectively. The 108 gene signature was considered most valid, by a combination of ROC and SOD criteria (Figure 4).Figure 4


Familial and sporadic idiopathic pulmonary fibrosis: making the diagnosis from peripheral blood.

Meltzer EB, Barry WT, Yang IV, Brown KK, Schwarz MI, Patel H, Ashley A, Noble PW, Schwartz DA, Steele MP - BMC Genomics (2014)

Selection of features for the model. Leave-one-out cross validation (LOOCV) is performed on all possible gene signatures, ranging from 50–250 features. Then, the performance characteristics of this bootstrap test are used to select an optimal number of genes (features) with which to build the signature. (A) Maximum area under the curve is achieved with signatures containing 105, 107, 108, 109 and 111 features. (B) The sum of deviance of the predicted probabilities is minimized by selecting the signature that contains 108 features.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4288625&req=5

Fig4: Selection of features for the model. Leave-one-out cross validation (LOOCV) is performed on all possible gene signatures, ranging from 50–250 features. Then, the performance characteristics of this bootstrap test are used to select an optimal number of genes (features) with which to build the signature. (A) Maximum area under the curve is achieved with signatures containing 105, 107, 108, 109 and 111 features. (B) The sum of deviance of the predicted probabilities is minimized by selecting the signature that contains 108 features.
Mentions: In developing a Bayesian Probit Regression model for IPF versus normal, one of the first steps is to select the optimal number of features (genes) to include in the functional gene signature. This was accomplished through an iterative data-driven process, whereby consecutive models were constructed, through a range of features from 50–250 genes (the practical limits of computational power). For each consecutive model, internal validity was measured with leave-one-out cross validation (LOOCV) and two parameters were examined: (a) the rate of phenotype misclassifications, calculated by measuring the area under the receiver operating characteristic curve (ROC statistic); and (b) the sum of deviance (SOD), an aggregate of deviances between the predicted posterior probability and the expected posterior probability of the true phenotype for each sample. The ROC statistic identified five potential models (with maximal performance on the LOOVC test): functional gene signatures containing 105, 107, 108, 109 and 111 features all attained ROC statistic = 0.814. Among these potential signatures, the optimal functional gene signature was chosen by examining the SOD. The functional gene signature with 108 gene features had the least SOD = 21.461; signatures containing 107 and 109 genes were close, with SOD = 21.469 and SOD = 21.467 respectively. The 108 gene signature was considered most valid, by a combination of ROC and SOD criteria (Figure 4).Figure 4

Bottom Line: Unsupervised clustering failed to discriminate between samples of different severity.The signature was successfully applied to the validation set, ROC area under the curve = 0.893, p < 0.0001.By using Bayesian probit regression to develop a model, we show that it is entirely possible to make a diagnosis of IPF from the peripheral blood with gene signatures.

View Article: PubMed Central - PubMed

Affiliation: Division of Allergy, Pulmonary, and Critical Care, Vanderbilt University Medical Center, 1313 21st Avenue South, 1105 Oxford House, Nashville, TN, USA. mark.p.steele@vanderbilt.edu.

ABSTRACT

Background: Peripheral blood biomarkers might improve diagnostic accuracy for idiopathic pulmonary fibrosis (IPF).

Results: Gene expression profiles were obtained from 89 patients with IPF and 26 normal controls. Samples were stratified according to severity of disease based on pulmonary function. The stratified dataset was split into subsets; two-thirds of the samples were selected to comprise the training set, while one-third was reserved for the validation set. Bayesian probit regression was used on the training set to develop a gene expression model for IPF versus normal. The gene expression model was tested by using it on the validation set to perform class prediction. Unsupervised clustering failed to discriminate between samples of different severity. Therefore, samples of all severities were included in the training and validation sets, in equal proportions. A gene signature model was developed from the training set. The model was built in an iterative fashion with the number of gene features selected to minimize the misclassification error in cross validation. The final model was based on the top 108 discriminating genes in the training set. The signature was successfully applied to the validation set, ROC area under the curve = 0.893, p < 0.0001. Using the optimal threshold (0.74) accurate class predictions were made for 77% of the test cases with sensitivity = 0.70, specificity = 1.00.

Conclusions: By using Bayesian probit regression to develop a model, we show that it is entirely possible to make a diagnosis of IPF from the peripheral blood with gene signatures.

Show MeSH
Related in: MedlinePlus