Limits...
Evolutionary algorithms for the selection of single nucleotide polymorphisms.

Hubley RM, Zitzler E, Roach JC - BMC Bioinformatics (2003)

Bottom Line: The choice of subset is influenced by many factors, including estimated or known reliability of the SNP, biochemical factors, intellectual property, cost, and effectiveness of the subset for mapping genes or identifying disease loci.They provide flexibility with respect to the problem formulation if a problem description evolves or changes.Results are produced as a trade-off front, allowing the user to make informed decisions when prioritizing factors.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute for Systems Biology, Seattle, WA, USA. rhubley@systemsbiology.org

ABSTRACT

Background: Large databases of single nucleotide polymorphisms (SNPs) are available for use in genomics studies. Typically, investigators must choose a subset of SNPs from these databases to employ in their studies. The choice of subset is influenced by many factors, including estimated or known reliability of the SNP, biochemical factors, intellectual property, cost, and effectiveness of the subset for mapping genes or identifying disease loci. We present an evolutionary algorithm for multiobjective SNP selection.

Results: We implemented a modified version of the Strength-Pareto Evolutionary Algorithm (SPEA2) in Java. Our implementation, Multiobjective Analyzer for Genetic Marker Acquisition (MAGMA), approximates the set of optimal trade-off solutions for large problems in minutes. This set is very useful for the design of large studies, including those oriented towards disease identification, genetic mapping, population studies, and haplotype-block elucidation.

Conclusion: Evolutionary algorithms are particularly suited for optimization problems that involve multiple objectives and a complex search space on which exact methods such as exhaustive enumeration cannot be applied. They provide flexibility with respect to the problem formulation if a problem description evolves or changes. Results are produced as a trade-off front, allowing the user to make informed decisions when prioritizing factors. MAGMA is open source and available at http://snp-magma.sourceforge.net. Evolutionary algorithms are well suited for many other applications in genomics.

Show MeSH
MHC locus. Illustration of MAGMA on a non-contrived problem. The scale at the top is number in base pairs relative to an arbitrarily designated beginning of part of the human MHC region. This locus is flanked by large gaps without known SNPs, so is a natural span for input to MAGMA. The library consists of 382 SNPs of varying quality. SNPs are indicated by black vertical lines; quality is proportional to the length of the lines. The colors of the bars indicate departure from a user-defined SNP influence radius, in this case, 6000 bp; red areas have sparser coverage while blue areas are more densely covered. The number of solutions displayed (in this case, the best f1 and f2, and two others) can be set by the user. "Heur" indicates the solution seeded by the heuristic. Numbers to the right indicate the value of each objective function for the solution, as well as the number of SNPs in the solution.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC183839&req=5

Figure 6: MHC locus. Illustration of MAGMA on a non-contrived problem. The scale at the top is number in base pairs relative to an arbitrarily designated beginning of part of the human MHC region. This locus is flanked by large gaps without known SNPs, so is a natural span for input to MAGMA. The library consists of 382 SNPs of varying quality. SNPs are indicated by black vertical lines; quality is proportional to the length of the lines. The colors of the bars indicate departure from a user-defined SNP influence radius, in this case, 6000 bp; red areas have sparser coverage while blue areas are more densely covered. The number of solutions displayed (in this case, the best f1 and f2, and two others) can be set by the user. "Heur" indicates the solution seeded by the heuristic. Numbers to the right indicate the value of each objective function for the solution, as well as the number of SNPs in the solution.

Mentions: The region of the human MHC locus illustrated in Figure 6 provides an example of a real SNP-selection problem. The trade-off front produced by MAGMA is graphed in Figure 7. The set of SNPs chosen for further study was selected by visual evaluation of this graph. For the development of biochemical assays, a solution from a region of diminishing returns was selected that had a cost compatible with available resources. The solution produced by the heuristic lies just behind the front produced by MAGMA. The heuristic is prevented from identifying a solution on the Pareto-optimal front largely because of its reliance on binning quality values (see Methods). Nevertheless, the solution produced by the heuristic is quite good.


Evolutionary algorithms for the selection of single nucleotide polymorphisms.

Hubley RM, Zitzler E, Roach JC - BMC Bioinformatics (2003)

MHC locus. Illustration of MAGMA on a non-contrived problem. The scale at the top is number in base pairs relative to an arbitrarily designated beginning of part of the human MHC region. This locus is flanked by large gaps without known SNPs, so is a natural span for input to MAGMA. The library consists of 382 SNPs of varying quality. SNPs are indicated by black vertical lines; quality is proportional to the length of the lines. The colors of the bars indicate departure from a user-defined SNP influence radius, in this case, 6000 bp; red areas have sparser coverage while blue areas are more densely covered. The number of solutions displayed (in this case, the best f1 and f2, and two others) can be set by the user. "Heur" indicates the solution seeded by the heuristic. Numbers to the right indicate the value of each objective function for the solution, as well as the number of SNPs in the solution.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC183839&req=5

Figure 6: MHC locus. Illustration of MAGMA on a non-contrived problem. The scale at the top is number in base pairs relative to an arbitrarily designated beginning of part of the human MHC region. This locus is flanked by large gaps without known SNPs, so is a natural span for input to MAGMA. The library consists of 382 SNPs of varying quality. SNPs are indicated by black vertical lines; quality is proportional to the length of the lines. The colors of the bars indicate departure from a user-defined SNP influence radius, in this case, 6000 bp; red areas have sparser coverage while blue areas are more densely covered. The number of solutions displayed (in this case, the best f1 and f2, and two others) can be set by the user. "Heur" indicates the solution seeded by the heuristic. Numbers to the right indicate the value of each objective function for the solution, as well as the number of SNPs in the solution.
Mentions: The region of the human MHC locus illustrated in Figure 6 provides an example of a real SNP-selection problem. The trade-off front produced by MAGMA is graphed in Figure 7. The set of SNPs chosen for further study was selected by visual evaluation of this graph. For the development of biochemical assays, a solution from a region of diminishing returns was selected that had a cost compatible with available resources. The solution produced by the heuristic lies just behind the front produced by MAGMA. The heuristic is prevented from identifying a solution on the Pareto-optimal front largely because of its reliance on binning quality values (see Methods). Nevertheless, the solution produced by the heuristic is quite good.

Bottom Line: The choice of subset is influenced by many factors, including estimated or known reliability of the SNP, biochemical factors, intellectual property, cost, and effectiveness of the subset for mapping genes or identifying disease loci.They provide flexibility with respect to the problem formulation if a problem description evolves or changes.Results are produced as a trade-off front, allowing the user to make informed decisions when prioritizing factors.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute for Systems Biology, Seattle, WA, USA. rhubley@systemsbiology.org

ABSTRACT

Background: Large databases of single nucleotide polymorphisms (SNPs) are available for use in genomics studies. Typically, investigators must choose a subset of SNPs from these databases to employ in their studies. The choice of subset is influenced by many factors, including estimated or known reliability of the SNP, biochemical factors, intellectual property, cost, and effectiveness of the subset for mapping genes or identifying disease loci. We present an evolutionary algorithm for multiobjective SNP selection.

Results: We implemented a modified version of the Strength-Pareto Evolutionary Algorithm (SPEA2) in Java. Our implementation, Multiobjective Analyzer for Genetic Marker Acquisition (MAGMA), approximates the set of optimal trade-off solutions for large problems in minutes. This set is very useful for the design of large studies, including those oriented towards disease identification, genetic mapping, population studies, and haplotype-block elucidation.

Conclusion: Evolutionary algorithms are particularly suited for optimization problems that involve multiple objectives and a complex search space on which exact methods such as exhaustive enumeration cannot be applied. They provide flexibility with respect to the problem formulation if a problem description evolves or changes. Results are produced as a trade-off front, allowing the user to make informed decisions when prioritizing factors. MAGMA is open source and available at http://snp-magma.sourceforge.net. Evolutionary algorithms are well suited for many other applications in genomics.

Show MeSH