Limits...
Identification of genetic markers with synergistic survival effect in cancer.

Louhimo R, Laakso M, Heikkinen T, Laitinen S, Manninen P, Rogojin V, Miettinen M, Blomqvist C, Liu J, Nevanlinna H, Hautaniemi S - BMC Syst Biol (2013)

Bottom Line: The identification of synergetic functioning SNPs on genome-scale is a computationally daunting task and requires advanced algorithms.We introduce a novel algorithm, Geninter, to identify SNPs that have synergetic effect on survival of cancer patients.Our results show that Geninter outperforms the logrank test and is able to identify SNP-pairs with synergetic impact on survival.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Cancers are complex diseases arising from accumulated genetic mutations that disrupt intracellular signaling networks. While several predisposing genetic mutations have been found, these individual mutations account only for a small fraction of cancer incidence and mortality. With large-scale measurement technologies, such as single nucleotide polymorphism (SNP) microarrays, it is now possible to identify combinatorial effects that have significant impact on cancer patient survival.

Results: The identification of synergetic functioning SNPs on genome-scale is a computationally daunting task and requires advanced algorithms. We introduce a novel algorithm, Geninter, to identify SNPs that have synergetic effect on survival of cancer patients. Using a large breast cancer cohort we generate a simulator that allows assessing reliability and accuracy of Geninter and logrank test, which is a standard statistical method to integrate genetic and survival data.

Conclusions: Our results show that Geninter outperforms the logrank test and is able to identify SNP-pairs with synergetic impact on survival.

Show MeSH

Related in: MedlinePlus

The outline of the Geninter analysis workflow. First, an attribute matrix containing genotypes and a matrix of survival times are given as an input to Geninter. The analysis is divided into three steps. The results (sorted list of SNP-pairs) can then be 12 annotated, for instance, using the Ensembl database and filtered to exclude markers that are in linkage disequilibrium (LD). The output contains the marker pairs, their ranks and p-values.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3750540&req=5

Figure 1: The outline of the Geninter analysis workflow. First, an attribute matrix containing genotypes and a matrix of survival times are given as an input to Geninter. The analysis is divided into three steps. The results (sorted list of SNP-pairs) can then be 12 annotated, for instance, using the Ensembl database and filtered to exclude markers that are in linkage disequilibrium (LD). The output contains the marker pairs, their ranks and p-values.

Mentions: Genome-wide analysis of pair-wise SNPs brings forward two major challenges. First, the combination of multiple marker genotypes increases the number of groups in the survival analysis. The major consequences of the increased number of groups are that (i) the number of samples should be relatively high in order to ensure stable estimates in the subgroups, and (ii) the increase in the number of survival curves leads to more intersections of the curves, which renders the logrank statistic less reliable [13]. This issue is exacerbated by the tendency of the logrank test to overestimate large cohorts to have significant survival differences even when the difference is only slight. Second, SNP microarrays produce states for hundreds of thousands or millions of markers making evaluation of all the pairs computationally intensive [11]. Geninter addresses the computational challenges with optimized code and distributed programming. The overall outline of Geninter is given in Figure 1. Here we provide details on how each step in Geninter is executed. First, an attribute matrix containing genotypes and a matrix of survival times are given as an input to Geninter. The analysis is divided into three steps: (1) determining the distance matrix based on the genotype combination specific Kaplan-Meier curves; (2) using hierarchical clustering to determine the underlying relative structure of the curves; and (3) computing the rank. If the rank of a SNP-pair exceeds a chosen threshold, the pair is considered as a putative survival affecting combination and stored. The user can define the threshold parameter based on the number of SNP-pairs or p-value cutoff. We have implemented Geninter so that it can be run as an individual program but also on Anduril bioinformatics workflow engine that allows advanced processing of the Geninter results, such as automated annotation (e.g., linkage disequilibrium (LD) mapping) from bio-databases [14].


Identification of genetic markers with synergistic survival effect in cancer.

Louhimo R, Laakso M, Heikkinen T, Laitinen S, Manninen P, Rogojin V, Miettinen M, Blomqvist C, Liu J, Nevanlinna H, Hautaniemi S - BMC Syst Biol (2013)

The outline of the Geninter analysis workflow. First, an attribute matrix containing genotypes and a matrix of survival times are given as an input to Geninter. The analysis is divided into three steps. The results (sorted list of SNP-pairs) can then be 12 annotated, for instance, using the Ensembl database and filtered to exclude markers that are in linkage disequilibrium (LD). The output contains the marker pairs, their ranks and p-values.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3750540&req=5

Figure 1: The outline of the Geninter analysis workflow. First, an attribute matrix containing genotypes and a matrix of survival times are given as an input to Geninter. The analysis is divided into three steps. The results (sorted list of SNP-pairs) can then be 12 annotated, for instance, using the Ensembl database and filtered to exclude markers that are in linkage disequilibrium (LD). The output contains the marker pairs, their ranks and p-values.
Mentions: Genome-wide analysis of pair-wise SNPs brings forward two major challenges. First, the combination of multiple marker genotypes increases the number of groups in the survival analysis. The major consequences of the increased number of groups are that (i) the number of samples should be relatively high in order to ensure stable estimates in the subgroups, and (ii) the increase in the number of survival curves leads to more intersections of the curves, which renders the logrank statistic less reliable [13]. This issue is exacerbated by the tendency of the logrank test to overestimate large cohorts to have significant survival differences even when the difference is only slight. Second, SNP microarrays produce states for hundreds of thousands or millions of markers making evaluation of all the pairs computationally intensive [11]. Geninter addresses the computational challenges with optimized code and distributed programming. The overall outline of Geninter is given in Figure 1. Here we provide details on how each step in Geninter is executed. First, an attribute matrix containing genotypes and a matrix of survival times are given as an input to Geninter. The analysis is divided into three steps: (1) determining the distance matrix based on the genotype combination specific Kaplan-Meier curves; (2) using hierarchical clustering to determine the underlying relative structure of the curves; and (3) computing the rank. If the rank of a SNP-pair exceeds a chosen threshold, the pair is considered as a putative survival affecting combination and stored. The user can define the threshold parameter based on the number of SNP-pairs or p-value cutoff. We have implemented Geninter so that it can be run as an individual program but also on Anduril bioinformatics workflow engine that allows advanced processing of the Geninter results, such as automated annotation (e.g., linkage disequilibrium (LD) mapping) from bio-databases [14].

Bottom Line: The identification of synergetic functioning SNPs on genome-scale is a computationally daunting task and requires advanced algorithms.We introduce a novel algorithm, Geninter, to identify SNPs that have synergetic effect on survival of cancer patients.Our results show that Geninter outperforms the logrank test and is able to identify SNP-pairs with synergetic impact on survival.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Cancers are complex diseases arising from accumulated genetic mutations that disrupt intracellular signaling networks. While several predisposing genetic mutations have been found, these individual mutations account only for a small fraction of cancer incidence and mortality. With large-scale measurement technologies, such as single nucleotide polymorphism (SNP) microarrays, it is now possible to identify combinatorial effects that have significant impact on cancer patient survival.

Results: The identification of synergetic functioning SNPs on genome-scale is a computationally daunting task and requires advanced algorithms. We introduce a novel algorithm, Geninter, to identify SNPs that have synergetic effect on survival of cancer patients. Using a large breast cancer cohort we generate a simulator that allows assessing reliability and accuracy of Geninter and logrank test, which is a standard statistical method to integrate genetic and survival data.

Conclusions: Our results show that Geninter outperforms the logrank test and is able to identify SNP-pairs with synergetic impact on survival.

Show MeSH
Related in: MedlinePlus