Limits...
Familial and sporadic idiopathic pulmonary fibrosis: making the diagnosis from peripheral blood.

Meltzer EB, Barry WT, Yang IV, Brown KK, Schwarz MI, Patel H, Ashley A, Noble PW, Schwartz DA, Steele MP - BMC Genomics (2014)

Bottom Line: Unsupervised clustering failed to discriminate between samples of different severity.The signature was successfully applied to the validation set, ROC area under the curve = 0.893, p < 0.0001.By using Bayesian probit regression to develop a model, we show that it is entirely possible to make a diagnosis of IPF from the peripheral blood with gene signatures.

View Article: PubMed Central - PubMed

Affiliation: Division of Allergy, Pulmonary, and Critical Care, Vanderbilt University Medical Center, 1313 21st Avenue South, 1105 Oxford House, Nashville, TN, USA. mark.p.steele@vanderbilt.edu.

ABSTRACT

Background: Peripheral blood biomarkers might improve diagnostic accuracy for idiopathic pulmonary fibrosis (IPF).

Results: Gene expression profiles were obtained from 89 patients with IPF and 26 normal controls. Samples were stratified according to severity of disease based on pulmonary function. The stratified dataset was split into subsets; two-thirds of the samples were selected to comprise the training set, while one-third was reserved for the validation set. Bayesian probit regression was used on the training set to develop a gene expression model for IPF versus normal. The gene expression model was tested by using it on the validation set to perform class prediction. Unsupervised clustering failed to discriminate between samples of different severity. Therefore, samples of all severities were included in the training and validation sets, in equal proportions. A gene signature model was developed from the training set. The model was built in an iterative fashion with the number of gene features selected to minimize the misclassification error in cross validation. The final model was based on the top 108 discriminating genes in the training set. The signature was successfully applied to the validation set, ROC area under the curve = 0.893, p < 0.0001. Using the optimal threshold (0.74) accurate class predictions were made for 77% of the test cases with sensitivity = 0.70, specificity = 1.00.

Conclusions: By using Bayesian probit regression to develop a model, we show that it is entirely possible to make a diagnosis of IPF from the peripheral blood with gene signatures.

Show MeSH

Related in: MedlinePlus

Principal component analysis of the entire dataset (115 subjects). First, the data is filtered for genes with a Coefficient of Variation ≥ 90th percentile. Then, all samples are plotted according to expression of the first two Principal Components. (A) Samples are identified by batch: batch 1 (black), batch 2 (red), batch 3 (green), batch 4 (blue), batch 5 (cyan), and batch 6 (magenta). (B) Samples are identified by severity of disease (FVC%, see text): normal (black), mild disease (blue), moderate disease (green), severe disease (red), unknown (magenta); and the analytic subset: training set (open circles), validation set (closed squares). (C) Samples are identified, again, by the severity of disease (DLCO%): color code is the same as in panel B. (D) Samples are identified by family history: normal (black), familial idiopathic pulmonary fibrosis (cyan), sporadic idiopathic pulmonary fibrosis (magenta).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4288625&req=5

Fig1: Principal component analysis of the entire dataset (115 subjects). First, the data is filtered for genes with a Coefficient of Variation ≥ 90th percentile. Then, all samples are plotted according to expression of the first two Principal Components. (A) Samples are identified by batch: batch 1 (black), batch 2 (red), batch 3 (green), batch 4 (blue), batch 5 (cyan), and batch 6 (magenta). (B) Samples are identified by severity of disease (FVC%, see text): normal (black), mild disease (blue), moderate disease (green), severe disease (red), unknown (magenta); and the analytic subset: training set (open circles), validation set (closed squares). (C) Samples are identified, again, by the severity of disease (DLCO%): color code is the same as in panel B. (D) Samples are identified by family history: normal (black), familial idiopathic pulmonary fibrosis (cyan), sporadic idiopathic pulmonary fibrosis (magenta).

Mentions: Prior to developing a gene signature model, the dataset was explored as a whole – to see if there were any global differences in gene expression that might be attributed to batch effects, differences in clinical severity, or family history. This exploratory analysis was performed with an unsupervised method, PCA (Figures 1 and 2). Prior to PCA, the dataset was filtered in an unsupervised fashion using the coefficient of variation (CoV). Filtering was done to improve the signal-to-noise ratio and resulted in a filtered dataset containing only the top 90th percentile by CoV (2208 genes).Figure 1


Familial and sporadic idiopathic pulmonary fibrosis: making the diagnosis from peripheral blood.

Meltzer EB, Barry WT, Yang IV, Brown KK, Schwarz MI, Patel H, Ashley A, Noble PW, Schwartz DA, Steele MP - BMC Genomics (2014)

Principal component analysis of the entire dataset (115 subjects). First, the data is filtered for genes with a Coefficient of Variation ≥ 90th percentile. Then, all samples are plotted according to expression of the first two Principal Components. (A) Samples are identified by batch: batch 1 (black), batch 2 (red), batch 3 (green), batch 4 (blue), batch 5 (cyan), and batch 6 (magenta). (B) Samples are identified by severity of disease (FVC%, see text): normal (black), mild disease (blue), moderate disease (green), severe disease (red), unknown (magenta); and the analytic subset: training set (open circles), validation set (closed squares). (C) Samples are identified, again, by the severity of disease (DLCO%): color code is the same as in panel B. (D) Samples are identified by family history: normal (black), familial idiopathic pulmonary fibrosis (cyan), sporadic idiopathic pulmonary fibrosis (magenta).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4288625&req=5

Fig1: Principal component analysis of the entire dataset (115 subjects). First, the data is filtered for genes with a Coefficient of Variation ≥ 90th percentile. Then, all samples are plotted according to expression of the first two Principal Components. (A) Samples are identified by batch: batch 1 (black), batch 2 (red), batch 3 (green), batch 4 (blue), batch 5 (cyan), and batch 6 (magenta). (B) Samples are identified by severity of disease (FVC%, see text): normal (black), mild disease (blue), moderate disease (green), severe disease (red), unknown (magenta); and the analytic subset: training set (open circles), validation set (closed squares). (C) Samples are identified, again, by the severity of disease (DLCO%): color code is the same as in panel B. (D) Samples are identified by family history: normal (black), familial idiopathic pulmonary fibrosis (cyan), sporadic idiopathic pulmonary fibrosis (magenta).
Mentions: Prior to developing a gene signature model, the dataset was explored as a whole – to see if there were any global differences in gene expression that might be attributed to batch effects, differences in clinical severity, or family history. This exploratory analysis was performed with an unsupervised method, PCA (Figures 1 and 2). Prior to PCA, the dataset was filtered in an unsupervised fashion using the coefficient of variation (CoV). Filtering was done to improve the signal-to-noise ratio and resulted in a filtered dataset containing only the top 90th percentile by CoV (2208 genes).Figure 1

Bottom Line: Unsupervised clustering failed to discriminate between samples of different severity.The signature was successfully applied to the validation set, ROC area under the curve = 0.893, p < 0.0001.By using Bayesian probit regression to develop a model, we show that it is entirely possible to make a diagnosis of IPF from the peripheral blood with gene signatures.

View Article: PubMed Central - PubMed

Affiliation: Division of Allergy, Pulmonary, and Critical Care, Vanderbilt University Medical Center, 1313 21st Avenue South, 1105 Oxford House, Nashville, TN, USA. mark.p.steele@vanderbilt.edu.

ABSTRACT

Background: Peripheral blood biomarkers might improve diagnostic accuracy for idiopathic pulmonary fibrosis (IPF).

Results: Gene expression profiles were obtained from 89 patients with IPF and 26 normal controls. Samples were stratified according to severity of disease based on pulmonary function. The stratified dataset was split into subsets; two-thirds of the samples were selected to comprise the training set, while one-third was reserved for the validation set. Bayesian probit regression was used on the training set to develop a gene expression model for IPF versus normal. The gene expression model was tested by using it on the validation set to perform class prediction. Unsupervised clustering failed to discriminate between samples of different severity. Therefore, samples of all severities were included in the training and validation sets, in equal proportions. A gene signature model was developed from the training set. The model was built in an iterative fashion with the number of gene features selected to minimize the misclassification error in cross validation. The final model was based on the top 108 discriminating genes in the training set. The signature was successfully applied to the validation set, ROC area under the curve = 0.893, p < 0.0001. Using the optimal threshold (0.74) accurate class predictions were made for 77% of the test cases with sensitivity = 0.70, specificity = 1.00.

Conclusions: By using Bayesian probit regression to develop a model, we show that it is entirely possible to make a diagnosis of IPF from the peripheral blood with gene signatures.

Show MeSH
Related in: MedlinePlus