Limits...
Enrichment analysis applied to disease prognosis.

Machado CM, Freitas AT, Couto FM - J Biomed Semantics (2013)

Bottom Line: With this analysis the objective is to identify clinical and biological features that characterize groups of patients with a common disease, and that can be used to distinguish between groups of patients associated with disease-related events.These analyses correspond to an adaptation of the standard enrichment analysis, since multiple sets of genes are being considered, one for each patient.The preliminary results are promising, as the sets of terms obtained reflect the current knowledge about the gene functions commonly altered in HCM patients, thus allowing their characterization.One of such factors is the need to test the enrichment analysis with clinical data, in addition to genetic data, since both types of data are expected to be necessary for prognosis purposes.

View Article: PubMed Central - HTML - PubMed

Affiliation: LaSIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal. cmachado@xldb.di.fc.ul.pt.

ABSTRACT
: Enrichment analysis is well established in the field of transcriptomics, where it is used to identify relevant biological features that characterize a set of genes obtained in an experiment.This article proposes the application of enrichment analysis as a first step in a disease prognosis methodology, in particular of diseases with a strong genetic component. With this analysis the objective is to identify clinical and biological features that characterize groups of patients with a common disease, and that can be used to distinguish between groups of patients associated with disease-related events. Data mining methodologies can then be used to exploit those features, and assist medical doctors in the evaluation of the patients in respect to their predisposition for a specific event.In this work the disease hypertrophic cardiomyopathy (HCM) is used as a case-study, as a first test to assess the feasibility of the application of an enrichment analysis to disease prognosis. To perform this assessment, two groups of patients have been considered: patients that have suffered a sudden cardiac death episode and patients that have not.The results presented were obtained with genetic data and the Gene Ontology, in two enrichment analyses: an enrichment profiling aiming at characterizing a group of patients (e.g. that suffered a disease-related event) based on their mutations; and a differential enrichment aiming at identifying differentiating features between a sub-group of patients and all the patients with the disease. These analyses correspond to an adaptation of the standard enrichment analysis, since multiple sets of genes are being considered, one for each patient.The preliminary results are promising, as the sets of terms obtained reflect the current knowledge about the gene functions commonly altered in HCM patients, thus allowing their characterization. Nevertheless, some factors need to be taken into consideration before the full potential of the enrichment analysis in the prognosis methodology can be evaluated. One of such factors is the need to test the enrichment analysis with clinical data, in addition to genetic data, since both types of data are expected to be necessary for prognosis purposes.

No MeSH data available.


Related in: MedlinePlus

Representation of the population and study sets in the enrichment profiling analysis. The two sets of dots represent the genome of two patients, from the same group (e.g. with SCD). The smaller, yellow set of dots, corresponds to the genes mutated in the patient; the larger, white set of dots, corresponds to the entire genome of the patient: genes not mutated (outside the yellow set) and genes mutated. In these sets of genes, blue dots represent genes annotated with a term of interest (t); gray dots represent genes not annotated with t. In the profiling analysis, the study set is the union of the genes mutated in all the patients; the population set is the union of the genome of all the patients. The annotation frequency is then calculated by counting the total number of genes annotated with the term in the study set (study frequency) and in the population set (population frequency).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4126066&req=5

Figure 2: Representation of the population and study sets in the enrichment profiling analysis. The two sets of dots represent the genome of two patients, from the same group (e.g. with SCD). The smaller, yellow set of dots, corresponds to the genes mutated in the patient; the larger, white set of dots, corresponds to the entire genome of the patient: genes not mutated (outside the yellow set) and genes mutated. In these sets of genes, blue dots represent genes annotated with a term of interest (t); gray dots represent genes not annotated with t. In the profiling analysis, the study set is the union of the genes mutated in all the patients; the population set is the union of the genome of all the patients. The annotation frequency is then calculated by counting the total number of genes annotated with the term in the study set (study frequency) and in the population set (population frequency).

Mentions: In the enrichment approach followed in this work, i.e. Single Enrichment Analysis, the set of genes selected by the user to be evaluated for the existence of enriched ontology terms is called the study set, and these genes can be the ones overexpressed in a microarray. The reference set of genes is called the population set, and can be the whole set of genes analyzed in the microarray. In the context of the patients’ profiling analysis, we can theorize the existence of a study set and a population set for each individual patient. The study set contains the genes mutated in the patient, whereas the population set contains all genes in the patient, either mutated or not. In the HCM dataset we only have mutation information for the genes associated with the disease, and consequently the study set is exclusively composed by these genes. The genes associated with HCM but not tested (see the Methods section for an explanation of how the genotyping is performed) have to be treated as genes without mutations just as happens with the genes not associated with HCM, and are included in the population set. The enrichment analysis is then performed considering in the study set all the genes mutated in all the patients of a given group (e.g. with SCD). In turn, the population set includes all the genes in the genome of all the patients in the same group (see Figure 2 for a representation of how the two sets of genes are obtained).


Enrichment analysis applied to disease prognosis.

Machado CM, Freitas AT, Couto FM - J Biomed Semantics (2013)

Representation of the population and study sets in the enrichment profiling analysis. The two sets of dots represent the genome of two patients, from the same group (e.g. with SCD). The smaller, yellow set of dots, corresponds to the genes mutated in the patient; the larger, white set of dots, corresponds to the entire genome of the patient: genes not mutated (outside the yellow set) and genes mutated. In these sets of genes, blue dots represent genes annotated with a term of interest (t); gray dots represent genes not annotated with t. In the profiling analysis, the study set is the union of the genes mutated in all the patients; the population set is the union of the genome of all the patients. The annotation frequency is then calculated by counting the total number of genes annotated with the term in the study set (study frequency) and in the population set (population frequency).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4126066&req=5

Figure 2: Representation of the population and study sets in the enrichment profiling analysis. The two sets of dots represent the genome of two patients, from the same group (e.g. with SCD). The smaller, yellow set of dots, corresponds to the genes mutated in the patient; the larger, white set of dots, corresponds to the entire genome of the patient: genes not mutated (outside the yellow set) and genes mutated. In these sets of genes, blue dots represent genes annotated with a term of interest (t); gray dots represent genes not annotated with t. In the profiling analysis, the study set is the union of the genes mutated in all the patients; the population set is the union of the genome of all the patients. The annotation frequency is then calculated by counting the total number of genes annotated with the term in the study set (study frequency) and in the population set (population frequency).
Mentions: In the enrichment approach followed in this work, i.e. Single Enrichment Analysis, the set of genes selected by the user to be evaluated for the existence of enriched ontology terms is called the study set, and these genes can be the ones overexpressed in a microarray. The reference set of genes is called the population set, and can be the whole set of genes analyzed in the microarray. In the context of the patients’ profiling analysis, we can theorize the existence of a study set and a population set for each individual patient. The study set contains the genes mutated in the patient, whereas the population set contains all genes in the patient, either mutated or not. In the HCM dataset we only have mutation information for the genes associated with the disease, and consequently the study set is exclusively composed by these genes. The genes associated with HCM but not tested (see the Methods section for an explanation of how the genotyping is performed) have to be treated as genes without mutations just as happens with the genes not associated with HCM, and are included in the population set. The enrichment analysis is then performed considering in the study set all the genes mutated in all the patients of a given group (e.g. with SCD). In turn, the population set includes all the genes in the genome of all the patients in the same group (see Figure 2 for a representation of how the two sets of genes are obtained).

Bottom Line: With this analysis the objective is to identify clinical and biological features that characterize groups of patients with a common disease, and that can be used to distinguish between groups of patients associated with disease-related events.These analyses correspond to an adaptation of the standard enrichment analysis, since multiple sets of genes are being considered, one for each patient.The preliminary results are promising, as the sets of terms obtained reflect the current knowledge about the gene functions commonly altered in HCM patients, thus allowing their characterization.One of such factors is the need to test the enrichment analysis with clinical data, in addition to genetic data, since both types of data are expected to be necessary for prognosis purposes.

View Article: PubMed Central - HTML - PubMed

Affiliation: LaSIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal. cmachado@xldb.di.fc.ul.pt.

ABSTRACT
: Enrichment analysis is well established in the field of transcriptomics, where it is used to identify relevant biological features that characterize a set of genes obtained in an experiment.This article proposes the application of enrichment analysis as a first step in a disease prognosis methodology, in particular of diseases with a strong genetic component. With this analysis the objective is to identify clinical and biological features that characterize groups of patients with a common disease, and that can be used to distinguish between groups of patients associated with disease-related events. Data mining methodologies can then be used to exploit those features, and assist medical doctors in the evaluation of the patients in respect to their predisposition for a specific event.In this work the disease hypertrophic cardiomyopathy (HCM) is used as a case-study, as a first test to assess the feasibility of the application of an enrichment analysis to disease prognosis. To perform this assessment, two groups of patients have been considered: patients that have suffered a sudden cardiac death episode and patients that have not.The results presented were obtained with genetic data and the Gene Ontology, in two enrichment analyses: an enrichment profiling aiming at characterizing a group of patients (e.g. that suffered a disease-related event) based on their mutations; and a differential enrichment aiming at identifying differentiating features between a sub-group of patients and all the patients with the disease. These analyses correspond to an adaptation of the standard enrichment analysis, since multiple sets of genes are being considered, one for each patient.The preliminary results are promising, as the sets of terms obtained reflect the current knowledge about the gene functions commonly altered in HCM patients, thus allowing their characterization. Nevertheless, some factors need to be taken into consideration before the full potential of the enrichment analysis in the prognosis methodology can be evaluated. One of such factors is the need to test the enrichment analysis with clinical data, in addition to genetic data, since both types of data are expected to be necessary for prognosis purposes.

No MeSH data available.


Related in: MedlinePlus