Limits...
cgaTOH: extended approach for identifying tracts of homozygosity.

Zhang L, Orloff MS, Reber S, Li S, Zhao Y, Eng C - PLoS ONE (2013)

Bottom Line: Identification of disease variants via homozygosity mapping and investigation of the effects of genome-wide homozygosity regions on traits of biomedical importance have been widely applied recently.NCBI genome map viewer is incorporated into the system.Moreover, we discuss the choice of implementing appropriate empirical ranges of critical parameters by applying to disease models.

View Article: PubMed Central - PubMed

Affiliation: Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, United States of America. zhangl3@ccf.org

ABSTRACT
Identification of disease variants via homozygosity mapping and investigation of the effects of genome-wide homozygosity regions on traits of biomedical importance have been widely applied recently. Nonetheless, the existing methods and algorithms to identify long tracts of homozygosity (TOH) are not able to provide efficient and rigorous regions for further downstream association investigation. We expanded current methods to identify TOHs by defining "surrogate-TOH", a region covering a cluster of TOHs with specific characteristics. Our defined surrogate-TOH includes cTOH, viz a common TOH region where at least ten TOHs present; gTOH, whereby a group of highly overlapping TOHs share proximal boundaries; and aTOH, which are allelically-matched TOHs. Searching for gTOH and aTOH was based on a repeated binary spectral clustering algorithm, where a hierarchy of clusters is created and represented by a TOH cluster tree. Based on the proposed method of identifying different species of surrogate-TOH, our cgaTOH software was developed. The software provides an intuitive and interactive visualization tool for better investigation of the high-throughput output with special interactive navigation rings, which will find its applicability in both conventional association studies and more sophisticated downstream analyses. NCBI genome map viewer is incorporated into the system. Moreover, we discuss the choice of implementing appropriate empirical ranges of critical parameters by applying to disease models. This method identifies various patterned clusters of SNPs demonstrating extended homozygosity, thus one can observe different aspects of the multi-faceted characteristics of TOHs.

Show MeSH

Related in: MedlinePlus

gTOH (rs198845-rs12190473) and aTOH (rs3130778- rs376681) regions associated with lung cancer.(A) –log10 transformed p-values obtained from the association tests. The green line, red line and blue line are the p-values corresponding to gTOH (rs198845-rs12190473), aTOH (rs3130778- rs376681) region and their parent cTOH region, respectively. The purple dots and black dots are p-values<0.05 and > = 0.05 based on single SNP association tests within the same region of the parent cTOH. (B) The corresponding lung-cancer risks as odds ratios (OR) and 95% confidence interval (CI). Green solid line and dash line corresponding to OR and 95%CI for gTOH, while red and blue lines are for aTOH and it’s parent cTOH. The purple dots represent OR for single SNP risk with grey solid vertical lines showing the 95% CIs.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3585782&req=5

pone-0057772-g002: gTOH (rs198845-rs12190473) and aTOH (rs3130778- rs376681) regions associated with lung cancer.(A) –log10 transformed p-values obtained from the association tests. The green line, red line and blue line are the p-values corresponding to gTOH (rs198845-rs12190473), aTOH (rs3130778- rs376681) region and their parent cTOH region, respectively. The purple dots and black dots are p-values<0.05 and > = 0.05 based on single SNP association tests within the same region of the parent cTOH. (B) The corresponding lung-cancer risks as odds ratios (OR) and 95% confidence interval (CI). Green solid line and dash line corresponding to OR and 95%CI for gTOH, while red and blue lines are for aTOH and it’s parent cTOH. The purple dots represent OR for single SNP risk with grey solid vertical lines showing the 95% CIs.

Mentions: After controlling for demographics and smoking, we identified 7 cTOHs associated with lung cancer (p-value<0.01) [11]. Three cTOHs were over-represented in cases over controls, whereas 4 were under-represented (see Table 2 in [11]). By using the TOH cluster tree which is accomplished by the repeated binary spectral clustering algorithm, we also detected 7 gTOHs associated with the disease (p-value<0.01), which include 4 case-only and one control-only gTOHs (Table 1), and 5 aTOHs associated with the disease (p-value<0.01), which include 3 case-only (Table 2). We did not observe any TOHs or cTOHs appearing only in either case or control subjects. In addition, there are 6106 gTOHs only present in case subjects, among which 23 gTOHs were found in more than 5 cases (≥0.6% of cases), and 6800 gTOHs only present in control subjects, among which 32 gTOHs were found in more than 5 controls (≥0.6% of cases) (Table S1 in File S2). Twenty-three case-only aTOHs (out of 6442) were found in more than 5 case subjects (≥0.6% of cases) and 23 control-only gTOHs (out of 7279) were found in more than 5 controls (≥0.6% of controls) (Table S2 in File S2). Furthermore, none of the cTOHs, which inhabit the disease-associated gTOHs and aTOHs, was significantly associated with the disease. For example, 6p22.1 and 8q23.3 were identified associated with the lung cancer where both gTOH and aTOH were detected (Figure 2 and [3]), however, the cTOHs where the corresponding gTOHs and aTOHs reside were not detected as significant regions associated with lung cancer, and both regions present prevalent in lung cancer patients. Based on GWAS Catalog [17], 6p22.1 region covers TRNAA-UGC which was reported associated with lung adenocarcinoma [16] and 8q23.3 region contains EIF3H which was reported associated with colorectal cancer [18],[19].


cgaTOH: extended approach for identifying tracts of homozygosity.

Zhang L, Orloff MS, Reber S, Li S, Zhao Y, Eng C - PLoS ONE (2013)

gTOH (rs198845-rs12190473) and aTOH (rs3130778- rs376681) regions associated with lung cancer.(A) –log10 transformed p-values obtained from the association tests. The green line, red line and blue line are the p-values corresponding to gTOH (rs198845-rs12190473), aTOH (rs3130778- rs376681) region and their parent cTOH region, respectively. The purple dots and black dots are p-values<0.05 and > = 0.05 based on single SNP association tests within the same region of the parent cTOH. (B) The corresponding lung-cancer risks as odds ratios (OR) and 95% confidence interval (CI). Green solid line and dash line corresponding to OR and 95%CI for gTOH, while red and blue lines are for aTOH and it’s parent cTOH. The purple dots represent OR for single SNP risk with grey solid vertical lines showing the 95% CIs.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3585782&req=5

pone-0057772-g002: gTOH (rs198845-rs12190473) and aTOH (rs3130778- rs376681) regions associated with lung cancer.(A) –log10 transformed p-values obtained from the association tests. The green line, red line and blue line are the p-values corresponding to gTOH (rs198845-rs12190473), aTOH (rs3130778- rs376681) region and their parent cTOH region, respectively. The purple dots and black dots are p-values<0.05 and > = 0.05 based on single SNP association tests within the same region of the parent cTOH. (B) The corresponding lung-cancer risks as odds ratios (OR) and 95% confidence interval (CI). Green solid line and dash line corresponding to OR and 95%CI for gTOH, while red and blue lines are for aTOH and it’s parent cTOH. The purple dots represent OR for single SNP risk with grey solid vertical lines showing the 95% CIs.
Mentions: After controlling for demographics and smoking, we identified 7 cTOHs associated with lung cancer (p-value<0.01) [11]. Three cTOHs were over-represented in cases over controls, whereas 4 were under-represented (see Table 2 in [11]). By using the TOH cluster tree which is accomplished by the repeated binary spectral clustering algorithm, we also detected 7 gTOHs associated with the disease (p-value<0.01), which include 4 case-only and one control-only gTOHs (Table 1), and 5 aTOHs associated with the disease (p-value<0.01), which include 3 case-only (Table 2). We did not observe any TOHs or cTOHs appearing only in either case or control subjects. In addition, there are 6106 gTOHs only present in case subjects, among which 23 gTOHs were found in more than 5 cases (≥0.6% of cases), and 6800 gTOHs only present in control subjects, among which 32 gTOHs were found in more than 5 controls (≥0.6% of cases) (Table S1 in File S2). Twenty-three case-only aTOHs (out of 6442) were found in more than 5 case subjects (≥0.6% of cases) and 23 control-only gTOHs (out of 7279) were found in more than 5 controls (≥0.6% of controls) (Table S2 in File S2). Furthermore, none of the cTOHs, which inhabit the disease-associated gTOHs and aTOHs, was significantly associated with the disease. For example, 6p22.1 and 8q23.3 were identified associated with the lung cancer where both gTOH and aTOH were detected (Figure 2 and [3]), however, the cTOHs where the corresponding gTOHs and aTOHs reside were not detected as significant regions associated with lung cancer, and both regions present prevalent in lung cancer patients. Based on GWAS Catalog [17], 6p22.1 region covers TRNAA-UGC which was reported associated with lung adenocarcinoma [16] and 8q23.3 region contains EIF3H which was reported associated with colorectal cancer [18],[19].

Bottom Line: Identification of disease variants via homozygosity mapping and investigation of the effects of genome-wide homozygosity regions on traits of biomedical importance have been widely applied recently.NCBI genome map viewer is incorporated into the system.Moreover, we discuss the choice of implementing appropriate empirical ranges of critical parameters by applying to disease models.

View Article: PubMed Central - PubMed

Affiliation: Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, United States of America. zhangl3@ccf.org

ABSTRACT
Identification of disease variants via homozygosity mapping and investigation of the effects of genome-wide homozygosity regions on traits of biomedical importance have been widely applied recently. Nonetheless, the existing methods and algorithms to identify long tracts of homozygosity (TOH) are not able to provide efficient and rigorous regions for further downstream association investigation. We expanded current methods to identify TOHs by defining "surrogate-TOH", a region covering a cluster of TOHs with specific characteristics. Our defined surrogate-TOH includes cTOH, viz a common TOH region where at least ten TOHs present; gTOH, whereby a group of highly overlapping TOHs share proximal boundaries; and aTOH, which are allelically-matched TOHs. Searching for gTOH and aTOH was based on a repeated binary spectral clustering algorithm, where a hierarchy of clusters is created and represented by a TOH cluster tree. Based on the proposed method of identifying different species of surrogate-TOH, our cgaTOH software was developed. The software provides an intuitive and interactive visualization tool for better investigation of the high-throughput output with special interactive navigation rings, which will find its applicability in both conventional association studies and more sophisticated downstream analyses. NCBI genome map viewer is incorporated into the system. Moreover, we discuss the choice of implementing appropriate empirical ranges of critical parameters by applying to disease models. This method identifies various patterned clusters of SNPs demonstrating extended homozygosity, thus one can observe different aspects of the multi-faceted characteristics of TOHs.

Show MeSH
Related in: MedlinePlus