Limits...
A meta-analysis strategy for gene prioritization using gene expression, SNP genotype, and eQTL data.

Che J, Shin M - Biomed Res Int (2015)

Bottom Line: For integration, we utilized an improved technique for the order of preference by similarity to ideal solution (TOPSIS) to combine scores from distinct resources.This method was evaluated on two publicly available datasets regarding prostate cancer and lung cancer to identify disease-related genes.Consequently, our proposed strategy for gene prioritization showed its superiority to conventional methods in discovering significant disease-related genes with several types of genetic resources, while making good use of potential complementarities among available resources.

View Article: PubMed Central - PubMed

Affiliation: Bio-Intelligence & Data Mining Lab, School of Electronics Engineering, Kyungpook National University, 1370 Sankyuk-dong, Buk-gu, Daegu 702-701, Republic of Korea.

ABSTRACT
In order to understand disease pathogenesis, improve medical diagnosis, or discover effective drug targets, it is important to identify significant genes deeply involved in human disease. For this purpose, many earlier approaches attempted to prioritize candidate genes using gene expression profiles or SNP genotype data, but they often suffer from producing many false-positive results. To address this issue, in this paper, we propose a meta-analysis strategy for gene prioritization that employs three different genetic resources--gene expression data, single nucleotide polymorphism (SNP) genotype data, and expression quantitative trait loci (eQTL) data--in an integrative manner. For integration, we utilized an improved technique for the order of preference by similarity to ideal solution (TOPSIS) to combine scores from distinct resources. This method was evaluated on two publicly available datasets regarding prostate cancer and lung cancer to identify disease-related genes. Consequently, our proposed strategy for gene prioritization showed its superiority to conventional methods in discovering significant disease-related genes with several types of genetic resources, while making good use of potential complementarities among available resources.

Show MeSH

Related in: MedlinePlus

Overall procedure of the proposed strategy for gene prioritization which consists of four steps: (1) convert probe IDs to gene symbols, (2) apply test methods to obtain scores, (3) filter out duplicate genes in each score list, and (4) integrate scores with improved TOPSIS.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4385654&req=5

fig1: Overall procedure of the proposed strategy for gene prioritization which consists of four steps: (1) convert probe IDs to gene symbols, (2) apply test methods to obtain scores, (3) filter out duplicate genes in each score list, and (4) integrate scores with improved TOPSIS.

Mentions: The overall procedure of the proposed strategy for gene prioritization is illustrated in Figure 1. To begin with, we preprocessed the gene expression data for specific disease by using the comprehensive robust multiarray average [28] method and produced the prostate cancer gene expression data consisting of 12,625 probes and 128 samples with 65 cases and 63 controls, and the lung cancer gene expression data consisted of 54,675 probes and 120 samples with 60 cases and 60 controls. For the processing of SNP genotype data, we removed such SNPs satisfying minimum allele frequency <0.01 and Hardy-Weinberg equilibrium test statistic value lower than ~7. Consequently, we obtained the prostate cancer SNP genotype data consisting of 709,216 SNPs and 72 samples with 39 cases and 33 controls, and the lung cancer SNP genotype data consisted of 760,716 SNPs and 122 samples with 61 cases and 61 controls. Next, with the above preprocessed data of gene expression and SNP genotype, we converted the probe IDs (or SNP IDs) to gene symbols with gene (or SNP) annotations, producing two datasets named GeneExp data and GeneSNP data, respectively. Also, by using eQTL data that conveys the biological relationships between SNPs and their regulated genes, we converted SNP IDs in the SNP genotype data to gene symbols, producing another dataset named GeneQTL data. Thus, eventually, it resulted in generating three datasets of genes (i.e., GeneExp data, GeneSNP data, and GeneQTL data), where each dataset may contain duplicate genes occurring by multiple probes mapped into the same gene symbol. These duplicate genes, if any, were filtered out after obtaining gene scores for each dataset.


A meta-analysis strategy for gene prioritization using gene expression, SNP genotype, and eQTL data.

Che J, Shin M - Biomed Res Int (2015)

Overall procedure of the proposed strategy for gene prioritization which consists of four steps: (1) convert probe IDs to gene symbols, (2) apply test methods to obtain scores, (3) filter out duplicate genes in each score list, and (4) integrate scores with improved TOPSIS.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4385654&req=5

fig1: Overall procedure of the proposed strategy for gene prioritization which consists of four steps: (1) convert probe IDs to gene symbols, (2) apply test methods to obtain scores, (3) filter out duplicate genes in each score list, and (4) integrate scores with improved TOPSIS.
Mentions: The overall procedure of the proposed strategy for gene prioritization is illustrated in Figure 1. To begin with, we preprocessed the gene expression data for specific disease by using the comprehensive robust multiarray average [28] method and produced the prostate cancer gene expression data consisting of 12,625 probes and 128 samples with 65 cases and 63 controls, and the lung cancer gene expression data consisted of 54,675 probes and 120 samples with 60 cases and 60 controls. For the processing of SNP genotype data, we removed such SNPs satisfying minimum allele frequency <0.01 and Hardy-Weinberg equilibrium test statistic value lower than ~7. Consequently, we obtained the prostate cancer SNP genotype data consisting of 709,216 SNPs and 72 samples with 39 cases and 33 controls, and the lung cancer SNP genotype data consisted of 760,716 SNPs and 122 samples with 61 cases and 61 controls. Next, with the above preprocessed data of gene expression and SNP genotype, we converted the probe IDs (or SNP IDs) to gene symbols with gene (or SNP) annotations, producing two datasets named GeneExp data and GeneSNP data, respectively. Also, by using eQTL data that conveys the biological relationships between SNPs and their regulated genes, we converted SNP IDs in the SNP genotype data to gene symbols, producing another dataset named GeneQTL data. Thus, eventually, it resulted in generating three datasets of genes (i.e., GeneExp data, GeneSNP data, and GeneQTL data), where each dataset may contain duplicate genes occurring by multiple probes mapped into the same gene symbol. These duplicate genes, if any, were filtered out after obtaining gene scores for each dataset.

Bottom Line: For integration, we utilized an improved technique for the order of preference by similarity to ideal solution (TOPSIS) to combine scores from distinct resources.This method was evaluated on two publicly available datasets regarding prostate cancer and lung cancer to identify disease-related genes.Consequently, our proposed strategy for gene prioritization showed its superiority to conventional methods in discovering significant disease-related genes with several types of genetic resources, while making good use of potential complementarities among available resources.

View Article: PubMed Central - PubMed

Affiliation: Bio-Intelligence & Data Mining Lab, School of Electronics Engineering, Kyungpook National University, 1370 Sankyuk-dong, Buk-gu, Daegu 702-701, Republic of Korea.

ABSTRACT
In order to understand disease pathogenesis, improve medical diagnosis, or discover effective drug targets, it is important to identify significant genes deeply involved in human disease. For this purpose, many earlier approaches attempted to prioritize candidate genes using gene expression profiles or SNP genotype data, but they often suffer from producing many false-positive results. To address this issue, in this paper, we propose a meta-analysis strategy for gene prioritization that employs three different genetic resources--gene expression data, single nucleotide polymorphism (SNP) genotype data, and expression quantitative trait loci (eQTL) data--in an integrative manner. For integration, we utilized an improved technique for the order of preference by similarity to ideal solution (TOPSIS) to combine scores from distinct resources. This method was evaluated on two publicly available datasets regarding prostate cancer and lung cancer to identify disease-related genes. Consequently, our proposed strategy for gene prioritization showed its superiority to conventional methods in discovering significant disease-related genes with several types of genetic resources, while making good use of potential complementarities among available resources.

Show MeSH
Related in: MedlinePlus