Limits...
Quick, "imputation-free" meta-analysis with proxy-SNPs.

Meesters C, Leber M, Herold C, Angisch M, Mattheisen M, Drichel D, Lacour A, Becker T - BMC Bioinformatics (2012)

Bottom Line: YAMAS (Yet Another Meta Analysis Software), however, enables cross-GWAS conclusions prior to finished and polished imputation runs, which eventually are time-consuming.Association results from GWAS analysis applications can be used as input files for MA, tremendously speeding up MA compared to the conventional imputation approach.As a proof of principle, we analyzed six dbGaP Type II Diabetes GWAS and found that the proxy algorithm clearly outperforms naïve MA on the p-value level: for 17 out of 23 we observe an improvement on the p-value level by a factor of more than two, and a maximum improvement by a factor of 2127.

View Article: PubMed Central - HTML - PubMed

Affiliation: German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany.

ABSTRACT

Background: Meta-analysis (MA) is widely used to pool genome-wide association studies (GWASes) in order to a) increase the power to detect strong or weak genotype effects or b) as a result verification method. As a consequence of differing SNP panels among genotyping chips, imputation is the method of choice within GWAS consortia to avoid losing too many SNPs in a MA. YAMAS (Yet Another Meta Analysis Software), however, enables cross-GWAS conclusions prior to finished and polished imputation runs, which eventually are time-consuming.

Results: Here we present a fast method to avoid forfeiting SNPs present in only a subset of studies, without relying on imputation. This is accomplished by using reference linkage disequilibrium data from 1,000 Genomes/HapMap projects to find proxy-SNPs together with in-phase alleles for SNPs missing in at least one study. MA is conducted by combining association effect estimates of a SNP and those of its proxy-SNPs. Our algorithm is implemented in the MA software YAMAS. Association results from GWAS analysis applications can be used as input files for MA, tremendously speeding up MA compared to the conventional imputation approach. We show that our proxy algorithm is well-powered and yields valuable ad hoc results, possibly providing an incentive for follow-up studies. We propose our method as a quick screening step prior to imputation-based MA, as well as an additional main approach for studies without available reference data matching the ethnicities of study participants. As a proof of principle, we analyzed six dbGaP Type II Diabetes GWAS and found that the proxy algorithm clearly outperforms naïve MA on the p-value level: for 17 out of 23 we observe an improvement on the p-value level by a factor of more than two, and a maximum improvement by a factor of 2127.

Conclusions: YAMAS is an efficient and fast meta-analysis program which offers various methods, including conventional MA as well as inserting proxy-SNPs for missing markers to avoid unnecessary power loss. MA with YAMAS can be readily conducted as YAMAS provides a generic parser for heterogeneous tabulated file formats within the GWAS field and avoids cumbersome setups. In this way, it supplements the meta-analysis process.

Show MeSH

Related in: MedlinePlus

Proxy meta-analysis schematic example. Schematic example of a meta-analysis with proxy markers. For simplicity we consider only two studies with four markers each (1-4). Common MA is applied on markers 1 and 4 (as they are present in both marker sets), yet when YAMAS hits marker 3, which is missing in the second study (3 – gray box), it selects marker 2 in study 2 as its proxy marker, based on the r2 indicator. Dashed arrows indicate non-chosen potential proxy markers. The case of the missing marker 2 in study 1 is omitted for better readability.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3472171&req=5

Figure 4: Proxy meta-analysis schematic example. Schematic example of a meta-analysis with proxy markers. For simplicity we consider only two studies with four markers each (1-4). Common MA is applied on markers 1 and 4 (as they are present in both marker sets), yet when YAMAS hits marker 3, which is missing in the second study (3 – gray box), it selects marker 2 in study 2 as its proxy marker, based on the r2 indicator. Dashed arrows indicate non-chosen potential proxy markers. The case of the missing marker 2 in study 1 is omitted for better readability.

Mentions: We assume that association results (effect estimates and standard errors) are available for the real genotype data of each participating study. In order to enable MA on the complete marker panel missing markers can be filled with “proxy markers”. For this purpose, a sample reference file based on 1,000 Genomes [19] SNP content is provided for download on the YAMAS web site. As an alternative, own reference files can be produced using genotype data in PLINK-format [10] with the current version of INTERSNP [31]. The reference file tabulates pairs of SNPs with marker IDs, each marker’s alleles, the chromosome the markers are on, their absolute physical distance in base pairs, r2 as a linkage disequilibrium indicator [32] and a boolean flag to define the proxy alleles (see below). We provide proxy files for CEU, YRI and JPT+HCN samples. In general, pairs of SNPs no more than 200kb apart and with an r2 ≥ 0.5 are listed. For the X chromosome and the MHC region, we choose a distance limit of 5 Mb. If the algorithm would encounter a situation where one marker is present in one of the studies, but missing in one or more of the other studies, it will try to find a proxy marker in those studies, compare Figure 4. Proxy markers are ranked by their mutual r2 (higher r2 ranks higher). This sorted list of markers is tried for the presence in the data set. The first present SNP, i.e., the SNP with the highest LD with the missing SNP, is chosen to be the proxy-SNP and will subsequently be used for MA, see also Figure 4. To account for the effect direction the proxy marker also carries the information for a proxy allele: the reference file designates the allele as the proxy allele for which the observed haplotype frequency is greater than the expected haplotype frequency under linkage equilibrium. In other words, the “in-phase” alleles of a SNP and its proxy define mutual proxy alleles (cf. also the section “Analysis of dbGaP data” for an example). A boolean indicator for the proxy alleles is part of the reference file. In summary, meta-analysis is always based on the established formula


Quick, "imputation-free" meta-analysis with proxy-SNPs.

Meesters C, Leber M, Herold C, Angisch M, Mattheisen M, Drichel D, Lacour A, Becker T - BMC Bioinformatics (2012)

Proxy meta-analysis schematic example. Schematic example of a meta-analysis with proxy markers. For simplicity we consider only two studies with four markers each (1-4). Common MA is applied on markers 1 and 4 (as they are present in both marker sets), yet when YAMAS hits marker 3, which is missing in the second study (3 – gray box), it selects marker 2 in study 2 as its proxy marker, based on the r2 indicator. Dashed arrows indicate non-chosen potential proxy markers. The case of the missing marker 2 in study 1 is omitted for better readability.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3472171&req=5

Figure 4: Proxy meta-analysis schematic example. Schematic example of a meta-analysis with proxy markers. For simplicity we consider only two studies with four markers each (1-4). Common MA is applied on markers 1 and 4 (as they are present in both marker sets), yet when YAMAS hits marker 3, which is missing in the second study (3 – gray box), it selects marker 2 in study 2 as its proxy marker, based on the r2 indicator. Dashed arrows indicate non-chosen potential proxy markers. The case of the missing marker 2 in study 1 is omitted for better readability.
Mentions: We assume that association results (effect estimates and standard errors) are available for the real genotype data of each participating study. In order to enable MA on the complete marker panel missing markers can be filled with “proxy markers”. For this purpose, a sample reference file based on 1,000 Genomes [19] SNP content is provided for download on the YAMAS web site. As an alternative, own reference files can be produced using genotype data in PLINK-format [10] with the current version of INTERSNP [31]. The reference file tabulates pairs of SNPs with marker IDs, each marker’s alleles, the chromosome the markers are on, their absolute physical distance in base pairs, r2 as a linkage disequilibrium indicator [32] and a boolean flag to define the proxy alleles (see below). We provide proxy files for CEU, YRI and JPT+HCN samples. In general, pairs of SNPs no more than 200kb apart and with an r2 ≥ 0.5 are listed. For the X chromosome and the MHC region, we choose a distance limit of 5 Mb. If the algorithm would encounter a situation where one marker is present in one of the studies, but missing in one or more of the other studies, it will try to find a proxy marker in those studies, compare Figure 4. Proxy markers are ranked by their mutual r2 (higher r2 ranks higher). This sorted list of markers is tried for the presence in the data set. The first present SNP, i.e., the SNP with the highest LD with the missing SNP, is chosen to be the proxy-SNP and will subsequently be used for MA, see also Figure 4. To account for the effect direction the proxy marker also carries the information for a proxy allele: the reference file designates the allele as the proxy allele for which the observed haplotype frequency is greater than the expected haplotype frequency under linkage equilibrium. In other words, the “in-phase” alleles of a SNP and its proxy define mutual proxy alleles (cf. also the section “Analysis of dbGaP data” for an example). A boolean indicator for the proxy alleles is part of the reference file. In summary, meta-analysis is always based on the established formula

Bottom Line: YAMAS (Yet Another Meta Analysis Software), however, enables cross-GWAS conclusions prior to finished and polished imputation runs, which eventually are time-consuming.Association results from GWAS analysis applications can be used as input files for MA, tremendously speeding up MA compared to the conventional imputation approach.As a proof of principle, we analyzed six dbGaP Type II Diabetes GWAS and found that the proxy algorithm clearly outperforms naïve MA on the p-value level: for 17 out of 23 we observe an improvement on the p-value level by a factor of more than two, and a maximum improvement by a factor of 2127.

View Article: PubMed Central - HTML - PubMed

Affiliation: German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany.

ABSTRACT

Background: Meta-analysis (MA) is widely used to pool genome-wide association studies (GWASes) in order to a) increase the power to detect strong or weak genotype effects or b) as a result verification method. As a consequence of differing SNP panels among genotyping chips, imputation is the method of choice within GWAS consortia to avoid losing too many SNPs in a MA. YAMAS (Yet Another Meta Analysis Software), however, enables cross-GWAS conclusions prior to finished and polished imputation runs, which eventually are time-consuming.

Results: Here we present a fast method to avoid forfeiting SNPs present in only a subset of studies, without relying on imputation. This is accomplished by using reference linkage disequilibrium data from 1,000 Genomes/HapMap projects to find proxy-SNPs together with in-phase alleles for SNPs missing in at least one study. MA is conducted by combining association effect estimates of a SNP and those of its proxy-SNPs. Our algorithm is implemented in the MA software YAMAS. Association results from GWAS analysis applications can be used as input files for MA, tremendously speeding up MA compared to the conventional imputation approach. We show that our proxy algorithm is well-powered and yields valuable ad hoc results, possibly providing an incentive for follow-up studies. We propose our method as a quick screening step prior to imputation-based MA, as well as an additional main approach for studies without available reference data matching the ethnicities of study participants. As a proof of principle, we analyzed six dbGaP Type II Diabetes GWAS and found that the proxy algorithm clearly outperforms naïve MA on the p-value level: for 17 out of 23 we observe an improvement on the p-value level by a factor of more than two, and a maximum improvement by a factor of 2127.

Conclusions: YAMAS is an efficient and fast meta-analysis program which offers various methods, including conventional MA as well as inserting proxy-SNPs for missing markers to avoid unnecessary power loss. MA with YAMAS can be readily conducted as YAMAS provides a generic parser for heterogeneous tabulated file formats within the GWAS field and avoids cumbersome setups. In this way, it supplements the meta-analysis process.

Show MeSH
Related in: MedlinePlus