Limits...
Identification of genetic markers with synergistic survival effect in cancer.

Louhimo R, Laakso M, Heikkinen T, Laitinen S, Manninen P, Rogojin V, Miettinen M, Blomqvist C, Liu J, Nevanlinna H, Hautaniemi S - BMC Syst Biol (2013)

Bottom Line: The identification of synergetic functioning SNPs on genome-scale is a computationally daunting task and requires advanced algorithms.We introduce a novel algorithm, Geninter, to identify SNPs that have synergetic effect on survival of cancer patients.Our results show that Geninter outperforms the logrank test and is able to identify SNP-pairs with synergetic impact on survival.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Cancers are complex diseases arising from accumulated genetic mutations that disrupt intracellular signaling networks. While several predisposing genetic mutations have been found, these individual mutations account only for a small fraction of cancer incidence and mortality. With large-scale measurement technologies, such as single nucleotide polymorphism (SNP) microarrays, it is now possible to identify combinatorial effects that have significant impact on cancer patient survival.

Results: The identification of synergetic functioning SNPs on genome-scale is a computationally daunting task and requires advanced algorithms. We introduce a novel algorithm, Geninter, to identify SNPs that have synergetic effect on survival of cancer patients. Using a large breast cancer cohort we generate a simulator that allows assessing reliability and accuracy of Geninter and logrank test, which is a standard statistical method to integrate genetic and survival data.

Conclusions: Our results show that Geninter outperforms the logrank test and is able to identify SNP-pairs with synergetic impact on survival.

Show MeSH

Related in: MedlinePlus

Average ROC curves for different statistics when the number of affected pairs increases. Solid lines are computed from the Geninter rank p-values, dashed from the logrank test p-values. Results from simulations with different numbers of affected marker pairs are drawn with different colors. Each curve is an average (averaged over 20 repetitions) of an analysis of a cohort of 10,000 samples with the number of affected marker combinations varying between 1 and 20.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3750540&req=5

Figure 5: Average ROC curves for different statistics when the number of affected pairs increases. Solid lines are computed from the Geninter rank p-values, dashed from the logrank test p-values. Results from simulations with different numbers of affected marker pairs are drawn with different colors. Each curve is an average (averaged over 20 repetitions) of an analysis of a cohort of 10,000 samples with the number of affected marker combinations varying between 1 and 20.

Mentions: We applied the Geninter and logrank methods to analyze all the combinatorial SNP-SNP survival effects in simulated data. Additionally, we calculated the single SNP survival effects with the logrank test. Evinced in Figure 3, a combination acquires the rank of > 0.5 (FDR corrected p < 5.99× 10−8) even when neither marker alone exhibits noticeable survival effect (FDR corrected p < 0.01). In order to assess the relative performance of the Geninter and logrank statistic, we calculated the false positive and true positive rates for both methods when the number of affected marker pairs was varied. The false positive rate is the number of false positives divided by the sum of false positives and true negatives. The true positive rate is the number of true positives divided by the sum of true positives and false negatives. Based on the true and false positives, we calculated the receiver operating characteristic (ROC) curves for both algorithms [22]. ROC curves enable a direct comparison of true and false positive rates while varying the threshold. We analyzed the behavior of the true positive and false positive rates with independent, simulated test data. For each of the rank vectors in Figure 5, we executed the analysis with both algorithms. We increased the number of affected marker pairs and recorded the changes in true and false positives. Furthermore, we repeated each simulation 20 times for each affected marker pair number, and averaged the rates over these repetitions to account for simulation variance. Both statistics were able to identify affected marker pairs correctly. However, the false positive rate of both methods increase along with the number of affected markers (Figure 5). Furthermore, the logrank statistic has a substantially worse false positive rate indicating that most of its findings are false positives even at very low p-value thresholds. The sharp, smooth form of the logrank ROC curves in Figure 5 reflects the rise of the false positive rate of the logrank test even at p-value thresholds near zero. The p-value threshold of significance for Geninter decreases when the proportion of affected to non-affected markers increases. For a low ratio (less than10%) of affected marker pairs to non-affected marker pairs, less than 10% false positive rate and over 99% true positive rate are achieved with the nominal p-value < 0.01.


Identification of genetic markers with synergistic survival effect in cancer.

Louhimo R, Laakso M, Heikkinen T, Laitinen S, Manninen P, Rogojin V, Miettinen M, Blomqvist C, Liu J, Nevanlinna H, Hautaniemi S - BMC Syst Biol (2013)

Average ROC curves for different statistics when the number of affected pairs increases. Solid lines are computed from the Geninter rank p-values, dashed from the logrank test p-values. Results from simulations with different numbers of affected marker pairs are drawn with different colors. Each curve is an average (averaged over 20 repetitions) of an analysis of a cohort of 10,000 samples with the number of affected marker combinations varying between 1 and 20.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3750540&req=5

Figure 5: Average ROC curves for different statistics when the number of affected pairs increases. Solid lines are computed from the Geninter rank p-values, dashed from the logrank test p-values. Results from simulations with different numbers of affected marker pairs are drawn with different colors. Each curve is an average (averaged over 20 repetitions) of an analysis of a cohort of 10,000 samples with the number of affected marker combinations varying between 1 and 20.
Mentions: We applied the Geninter and logrank methods to analyze all the combinatorial SNP-SNP survival effects in simulated data. Additionally, we calculated the single SNP survival effects with the logrank test. Evinced in Figure 3, a combination acquires the rank of > 0.5 (FDR corrected p < 5.99× 10−8) even when neither marker alone exhibits noticeable survival effect (FDR corrected p < 0.01). In order to assess the relative performance of the Geninter and logrank statistic, we calculated the false positive and true positive rates for both methods when the number of affected marker pairs was varied. The false positive rate is the number of false positives divided by the sum of false positives and true negatives. The true positive rate is the number of true positives divided by the sum of true positives and false negatives. Based on the true and false positives, we calculated the receiver operating characteristic (ROC) curves for both algorithms [22]. ROC curves enable a direct comparison of true and false positive rates while varying the threshold. We analyzed the behavior of the true positive and false positive rates with independent, simulated test data. For each of the rank vectors in Figure 5, we executed the analysis with both algorithms. We increased the number of affected marker pairs and recorded the changes in true and false positives. Furthermore, we repeated each simulation 20 times for each affected marker pair number, and averaged the rates over these repetitions to account for simulation variance. Both statistics were able to identify affected marker pairs correctly. However, the false positive rate of both methods increase along with the number of affected markers (Figure 5). Furthermore, the logrank statistic has a substantially worse false positive rate indicating that most of its findings are false positives even at very low p-value thresholds. The sharp, smooth form of the logrank ROC curves in Figure 5 reflects the rise of the false positive rate of the logrank test even at p-value thresholds near zero. The p-value threshold of significance for Geninter decreases when the proportion of affected to non-affected markers increases. For a low ratio (less than10%) of affected marker pairs to non-affected marker pairs, less than 10% false positive rate and over 99% true positive rate are achieved with the nominal p-value < 0.01.

Bottom Line: The identification of synergetic functioning SNPs on genome-scale is a computationally daunting task and requires advanced algorithms.We introduce a novel algorithm, Geninter, to identify SNPs that have synergetic effect on survival of cancer patients.Our results show that Geninter outperforms the logrank test and is able to identify SNP-pairs with synergetic impact on survival.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Cancers are complex diseases arising from accumulated genetic mutations that disrupt intracellular signaling networks. While several predisposing genetic mutations have been found, these individual mutations account only for a small fraction of cancer incidence and mortality. With large-scale measurement technologies, such as single nucleotide polymorphism (SNP) microarrays, it is now possible to identify combinatorial effects that have significant impact on cancer patient survival.

Results: The identification of synergetic functioning SNPs on genome-scale is a computationally daunting task and requires advanced algorithms. We introduce a novel algorithm, Geninter, to identify SNPs that have synergetic effect on survival of cancer patients. Using a large breast cancer cohort we generate a simulator that allows assessing reliability and accuracy of Geninter and logrank test, which is a standard statistical method to integrate genetic and survival data.

Conclusions: Our results show that Geninter outperforms the logrank test and is able to identify SNP-pairs with synergetic impact on survival.

Show MeSH
Related in: MedlinePlus