Limits...
Combining multiple family-based association studies.

Tang H, Peng J, Wang P, Coram M, Hsu L - BMC Proc (2007)

Bottom Line: Here we propose a novel statistical method that boosts the effective sample size by combining data obtained from several studies.Specifically, we consider a situation in which various studies have genotyped non-overlapping subjects at largely non-overlapping sets of markers.Our approach, which exploits the local linkage disequilibrium structure without assuming an explicit population model, opens up the possibility of improving statistical power by incorporating existing data into future association studies.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Genetics, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, California 94305, USA. huatang@fhcrc.org

ABSTRACT
While high-throughput genotyping technologies are becoming readily available, the merit of using these technologies to perform genome-wide association studies has not been established. One major concern is that for studies of complex diseases and traits, the whole-genome approach requires such large sample sizes that both recruitment and genotyping pose considerable challenge. Here we propose a novel statistical method that boosts the effective sample size by combining data obtained from several studies. Specifically, we consider a situation in which various studies have genotyped non-overlapping subjects at largely non-overlapping sets of markers. Our approach, which exploits the local linkage disequilibrium structure without assuming an explicit population model, opens up the possibility of improving statistical power by incorporating existing data into future association studies.

No MeSH data available.


Related in: MedlinePlus

TDT tests for mother-daughter transmissions, restricted to mothers with DR genotype 1/1. a, TDT for three subsets separately. Dotted vertical line indicates the location of DR locus. b, TDT scores using sub-sample A (square) versus imputed scores (line). c, TDT scores assuming all markers are genotyped in each individual (TDTall), open square) versus TDTcomb. Dotted line indicates 0.01 critical value for TDTall, and solid line represents the corresponding critical value for TDTcomb. d, Comparison of TDTall and TDTcomb after a quantile transformation.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2367479&req=5

Figure 1: TDT tests for mother-daughter transmissions, restricted to mothers with DR genotype 1/1. a, TDT for three subsets separately. Dotted vertical line indicates the location of DR locus. b, TDT scores using sub-sample A (square) versus imputed scores (line). c, TDT scores assuming all markers are genotyped in each individual (TDTall), open square) versus TDTcomb. Dotted line indicates 0.01 critical value for TDTall, and solid line represents the corresponding critical value for TDTcomb. d, Comparison of TDTall and TDTcomb after a quantile transformation.

Mentions: The results of various TDT test for the mother-daughter transmission in the 3-cM region are shown in Figure 1. Figure 1a displays the TDT scores using each subset of one-third of the families. Because the transmissions from a mother to two daughters are independent, a meaningful measure of sample size is the mother-daughter pairs. In our data, the numbers of mother-daughter transmissions in the three subsets are 46, 39, and 23, respectively. For all TDT tests, we use 10,000 permutations to establish the distribution and significance level. The p-values of TDT on the three subsets are 0.0011, 0.02, and 0.61, respectively. In Figure 1b, the points represent the TDT scores from subset A (families 1–500) and the solid line represents the loess prediction, . Figure 1c compares the TDT scores when all markers are genotyped in every individual (TDTall, open square) versus TDTcomb (filled points). While the maximum value achieved by TDTcomb appears substantially lower than the corresponding value by TDTall, the same is true under the hypothesis, because the imputed TDT statistics tend to be smoother than observed ones. As a result, at a specific significance level (say, 0.99), the critical value for TDTall is 17.38, while the corresponding critical value for TDTcomb is 6.87. We perform a quantile transformation based on the distribution, and Figure 1d compares TDTall with transformed TDTcomb. It indicates that, upon suitable transformation, TDTcomb can achieve similar significance level as if TDTall. However, the location of the peak shifts slightly: while the marker with highest TDTall lies to the right of the DR locus, that with the highest TDTcomb lies to the left of DR locus. Another consequence of smoothing and imputing TDT scores is that the "peak" of TDTcomb appears somewhat narrower than TDTall. In a similar fashion, we analyzed the other three types of transmission. The results, summarized in Table 1, suggest the existence of another variant that influences the disease risk. Interestingly, transmission is distorted in mother-daughter and father-daughter transmission, but not transmissions to sons. This suggests possible gene × sex interaction. Finally, while we set out to use severity as a relative weight for each individual, retrospective comparison indicates that the weight makes little difference.


Combining multiple family-based association studies.

Tang H, Peng J, Wang P, Coram M, Hsu L - BMC Proc (2007)

TDT tests for mother-daughter transmissions, restricted to mothers with DR genotype 1/1. a, TDT for three subsets separately. Dotted vertical line indicates the location of DR locus. b, TDT scores using sub-sample A (square) versus imputed scores (line). c, TDT scores assuming all markers are genotyped in each individual (TDTall), open square) versus TDTcomb. Dotted line indicates 0.01 critical value for TDTall, and solid line represents the corresponding critical value for TDTcomb. d, Comparison of TDTall and TDTcomb after a quantile transformation.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2367479&req=5

Figure 1: TDT tests for mother-daughter transmissions, restricted to mothers with DR genotype 1/1. a, TDT for three subsets separately. Dotted vertical line indicates the location of DR locus. b, TDT scores using sub-sample A (square) versus imputed scores (line). c, TDT scores assuming all markers are genotyped in each individual (TDTall), open square) versus TDTcomb. Dotted line indicates 0.01 critical value for TDTall, and solid line represents the corresponding critical value for TDTcomb. d, Comparison of TDTall and TDTcomb after a quantile transformation.
Mentions: The results of various TDT test for the mother-daughter transmission in the 3-cM region are shown in Figure 1. Figure 1a displays the TDT scores using each subset of one-third of the families. Because the transmissions from a mother to two daughters are independent, a meaningful measure of sample size is the mother-daughter pairs. In our data, the numbers of mother-daughter transmissions in the three subsets are 46, 39, and 23, respectively. For all TDT tests, we use 10,000 permutations to establish the distribution and significance level. The p-values of TDT on the three subsets are 0.0011, 0.02, and 0.61, respectively. In Figure 1b, the points represent the TDT scores from subset A (families 1–500) and the solid line represents the loess prediction, . Figure 1c compares the TDT scores when all markers are genotyped in every individual (TDTall, open square) versus TDTcomb (filled points). While the maximum value achieved by TDTcomb appears substantially lower than the corresponding value by TDTall, the same is true under the hypothesis, because the imputed TDT statistics tend to be smoother than observed ones. As a result, at a specific significance level (say, 0.99), the critical value for TDTall is 17.38, while the corresponding critical value for TDTcomb is 6.87. We perform a quantile transformation based on the distribution, and Figure 1d compares TDTall with transformed TDTcomb. It indicates that, upon suitable transformation, TDTcomb can achieve similar significance level as if TDTall. However, the location of the peak shifts slightly: while the marker with highest TDTall lies to the right of the DR locus, that with the highest TDTcomb lies to the left of DR locus. Another consequence of smoothing and imputing TDT scores is that the "peak" of TDTcomb appears somewhat narrower than TDTall. In a similar fashion, we analyzed the other three types of transmission. The results, summarized in Table 1, suggest the existence of another variant that influences the disease risk. Interestingly, transmission is distorted in mother-daughter and father-daughter transmission, but not transmissions to sons. This suggests possible gene × sex interaction. Finally, while we set out to use severity as a relative weight for each individual, retrospective comparison indicates that the weight makes little difference.

Bottom Line: Here we propose a novel statistical method that boosts the effective sample size by combining data obtained from several studies.Specifically, we consider a situation in which various studies have genotyped non-overlapping subjects at largely non-overlapping sets of markers.Our approach, which exploits the local linkage disequilibrium structure without assuming an explicit population model, opens up the possibility of improving statistical power by incorporating existing data into future association studies.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Genetics, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, California 94305, USA. huatang@fhcrc.org

ABSTRACT
While high-throughput genotyping technologies are becoming readily available, the merit of using these technologies to perform genome-wide association studies has not been established. One major concern is that for studies of complex diseases and traits, the whole-genome approach requires such large sample sizes that both recruitment and genotyping pose considerable challenge. Here we propose a novel statistical method that boosts the effective sample size by combining data obtained from several studies. Specifically, we consider a situation in which various studies have genotyped non-overlapping subjects at largely non-overlapping sets of markers. Our approach, which exploits the local linkage disequilibrium structure without assuming an explicit population model, opens up the possibility of improving statistical power by incorporating existing data into future association studies.

No MeSH data available.


Related in: MedlinePlus