Limits...
The Discovery of Novel Biomarkers Improves Breast Cancer Intrinsic Subtype Prediction and Reconciles the Labels in the METABRIC Data Set.

Milioli HH, Vimieiro R, Riveros C, Tishchenko I, Berretta R, Moscato P - PLoS ONE (2015)

Bottom Line: We first computed the CM1 score to identify the probes with highly discriminative patterns of expression across samples of each intrinsic subtype.Intrinsic subtypes assigned using the CM1 list and the ensemble of classifiers are more consistent and homogeneous than the original PAM50 labels.The new subtypes show accurate distributions of current clinical markers ER, PR and HER2, and survival curves in the METABRIC and ROCK data sets.

View Article: PubMed Central - PubMed

Affiliation: Priority Research Centre for Bioinformatics, Biomarker Discovery and Information-Based Medicine, Hunter Medical Research Institute, New Lambton Heights, NSW, Australia; School of Environmental and Life Science, The University of Newcastle, Callaghan, NSW, Australia.

ABSTRACT

Background: The prediction of breast cancer intrinsic subtypes has been introduced as a valuable strategy to determine patient diagnosis and prognosis, and therapy response. The PAM50 method, based on the expression levels of 50 genes, uses a single sample predictor model to assign subtype labels to samples. Intrinsic errors reported within this assay demonstrate the challenge of identifying and understanding the breast cancer groups. In this study, we aim to: a) identify novel biomarkers for subtype individuation by exploring the competence of a newly proposed method named CM1 score, and b) apply an ensemble learning, as opposed to the use of a single classifier, for sample subtype assignment. The overarching objective is to improve class prediction.

Methods and findings: The microarray transcriptome data sets used in this study are: the METABRIC breast cancer data recorded for over 2000 patients, and the public integrated source from ROCK database with 1570 samples. We first computed the CM1 score to identify the probes with highly discriminative patterns of expression across samples of each intrinsic subtype. We further assessed the ability of 42 selected probes on assigning correct subtype labels using 24 different classifiers from the Weka software suite. For comparison, the same method was applied on the list of 50 genes from the PAM50 method.

Conclusions: The CM1 score portrayed 30 novel biomarkers for predicting breast cancer subtypes, with the confirmation of the role of 12 well-established genes. Intrinsic subtypes assigned using the CM1 list and the ensemble of classifiers are more consistent and homogeneous than the original PAM50 labels. The new subtypes show accurate distributions of current clinical markers ER, PR and HER2, and survival curves in the METABRIC and ROCK data sets. Remarkably, the paradoxical attribution of the original labels reinforces the limitations of employing a single sample classifiers to predict breast cancer intrinsic subtypes.

No MeSH data available.


Related in: MedlinePlus

The gene expression profile of the balanced top ten probes selected for each of the five breast cancer intrinsic subtypes across 997 samples from the discovery set.The annotated genes are defined for each subtype as an intrinsic, highly discriminative, signature. Samples were ordered according to the gene expression similarities in each breast cancer subtype. Colours represent the selected genes and sample subtypes: luminal A (yellow), luminal B (green), HER2-enriched (purple), normal-like (blue), and basal-like (red).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4488510&req=5

pone.0129711.g002: The gene expression profile of the balanced top ten probes selected for each of the five breast cancer intrinsic subtypes across 997 samples from the discovery set.The annotated genes are defined for each subtype as an intrinsic, highly discriminative, signature. Samples were ordered according to the gene expression similarities in each breast cancer subtype. Colours represent the selected genes and sample subtypes: luminal A (yellow), luminal B (green), HER2-enriched (purple), normal-like (blue), and basal-like (red).

Mentions: The effectiveness of the CM1 list for segregating the five subtypes is depicted in Fig 2. The figure shows the expression values of the top five negative and top five positive ranked probes for each subtype across 997 samples in the METABRIC discovery set. For instance, the ten probes selected for the basal-like subtype—the most representative class—expose a consistent separation between samples from this class and the remaining ones. The second heat map in Fig 3 illustrates the expression levels of unique probes from the CM1 list in the Illumina platform, in which rows represent probes and columns represent samples. Rows and columns were ordered according to gene expression similarity using a memetic algorithm [27]. This image also exposes the overall discriminative power of our list for distinguishing samples of the five subtypes.


The Discovery of Novel Biomarkers Improves Breast Cancer Intrinsic Subtype Prediction and Reconciles the Labels in the METABRIC Data Set.

Milioli HH, Vimieiro R, Riveros C, Tishchenko I, Berretta R, Moscato P - PLoS ONE (2015)

The gene expression profile of the balanced top ten probes selected for each of the five breast cancer intrinsic subtypes across 997 samples from the discovery set.The annotated genes are defined for each subtype as an intrinsic, highly discriminative, signature. Samples were ordered according to the gene expression similarities in each breast cancer subtype. Colours represent the selected genes and sample subtypes: luminal A (yellow), luminal B (green), HER2-enriched (purple), normal-like (blue), and basal-like (red).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4488510&req=5

pone.0129711.g002: The gene expression profile of the balanced top ten probes selected for each of the five breast cancer intrinsic subtypes across 997 samples from the discovery set.The annotated genes are defined for each subtype as an intrinsic, highly discriminative, signature. Samples were ordered according to the gene expression similarities in each breast cancer subtype. Colours represent the selected genes and sample subtypes: luminal A (yellow), luminal B (green), HER2-enriched (purple), normal-like (blue), and basal-like (red).
Mentions: The effectiveness of the CM1 list for segregating the five subtypes is depicted in Fig 2. The figure shows the expression values of the top five negative and top five positive ranked probes for each subtype across 997 samples in the METABRIC discovery set. For instance, the ten probes selected for the basal-like subtype—the most representative class—expose a consistent separation between samples from this class and the remaining ones. The second heat map in Fig 3 illustrates the expression levels of unique probes from the CM1 list in the Illumina platform, in which rows represent probes and columns represent samples. Rows and columns were ordered according to gene expression similarity using a memetic algorithm [27]. This image also exposes the overall discriminative power of our list for distinguishing samples of the five subtypes.

Bottom Line: We first computed the CM1 score to identify the probes with highly discriminative patterns of expression across samples of each intrinsic subtype.Intrinsic subtypes assigned using the CM1 list and the ensemble of classifiers are more consistent and homogeneous than the original PAM50 labels.The new subtypes show accurate distributions of current clinical markers ER, PR and HER2, and survival curves in the METABRIC and ROCK data sets.

View Article: PubMed Central - PubMed

Affiliation: Priority Research Centre for Bioinformatics, Biomarker Discovery and Information-Based Medicine, Hunter Medical Research Institute, New Lambton Heights, NSW, Australia; School of Environmental and Life Science, The University of Newcastle, Callaghan, NSW, Australia.

ABSTRACT

Background: The prediction of breast cancer intrinsic subtypes has been introduced as a valuable strategy to determine patient diagnosis and prognosis, and therapy response. The PAM50 method, based on the expression levels of 50 genes, uses a single sample predictor model to assign subtype labels to samples. Intrinsic errors reported within this assay demonstrate the challenge of identifying and understanding the breast cancer groups. In this study, we aim to: a) identify novel biomarkers for subtype individuation by exploring the competence of a newly proposed method named CM1 score, and b) apply an ensemble learning, as opposed to the use of a single classifier, for sample subtype assignment. The overarching objective is to improve class prediction.

Methods and findings: The microarray transcriptome data sets used in this study are: the METABRIC breast cancer data recorded for over 2000 patients, and the public integrated source from ROCK database with 1570 samples. We first computed the CM1 score to identify the probes with highly discriminative patterns of expression across samples of each intrinsic subtype. We further assessed the ability of 42 selected probes on assigning correct subtype labels using 24 different classifiers from the Weka software suite. For comparison, the same method was applied on the list of 50 genes from the PAM50 method.

Conclusions: The CM1 score portrayed 30 novel biomarkers for predicting breast cancer subtypes, with the confirmation of the role of 12 well-established genes. Intrinsic subtypes assigned using the CM1 list and the ensemble of classifiers are more consistent and homogeneous than the original PAM50 labels. The new subtypes show accurate distributions of current clinical markers ER, PR and HER2, and survival curves in the METABRIC and ROCK data sets. Remarkably, the paradoxical attribution of the original labels reinforces the limitations of employing a single sample classifiers to predict breast cancer intrinsic subtypes.

No MeSH data available.


Related in: MedlinePlus