Limits...
Neural networks for genetic epidemiology: past, present, and future.

Motsinger-Reif AA, Ritchie MD - BioData Min (2008)

Bottom Line: The application of NN for statistical genetics studies is an active area of research.We discuss both the successes of these initial NN applications, and the questions that arose during the previous studies.Finally, we introduce evolutionary computing strategies, Genetic Programming Neural Networks (GPNN) and Grammatical Evolution Neural Networks (GENN), for using NN in association studies of complex human diseases that address some of the caveats illuminated by previous work.

View Article: PubMed Central - HTML - PubMed

Affiliation: Bioinformatics Research Center, Department of Statistics, North Carolina State University, Raleigh, NC, USA. motsinger@stat.ncsu.edu

ABSTRACT
During the past two decades, the field of human genetics has experienced an information explosion. The completion of the human genome project and the development of high throughput SNP technologies have created a wealth of data; however, the analysis and interpretation of these data have created a research bottleneck. While technology facilitates the measurement of hundreds or thousands of genes, statistical and computational methodologies are lacking for the analysis of these data. New statistical methods and variable selection strategies must be explored for identifying disease susceptibility genes for common, complex diseases. Neural networks (NN) are a class of pattern recognition methods that have been successfully implemented for data mining and prediction in a variety of fields. The application of NN for statistical genetics studies is an active area of research. Neural networks have been applied in both linkage and association analysis for the identification of disease susceptibility genes.In the current review, we consider how NN have been used for both linkage and association analyses in genetic epidemiology. We discuss both the successes of these initial NN applications, and the questions that arose during the previous studies. Finally, we introduce evolutionary computing strategies, Genetic Programming Neural Networks (GPNN) and Grammatical Evolution Neural Networks (GENN), for using NN in association studies of complex human diseases that address some of the caveats illuminated by previous work.

No MeSH data available.


Overview of the GPNN method (adapted from Ritchie et al. 2003). First, GPNN has a set of parameters to be initialized before beginning the evolution of NN models. Second, the data are divided into 10 equal parts for 10-fold cross-validation. Third, training begins by generating an initial population of random solutions. Fourth, each NN is evaluated on the training set and its fitness (classification error) recorded. Fifth, the best solutions are selected for crossover and reproduction using a fitness-proportionate selection technique. The new generation begins the cycle again. This continues until a stopping criterion (classification error of zero or limit on the number of generations) is met. At the end of the GPNN evolution, the overall best solution is selected as the optimal NN. Sixth, this best GPNN model is tested on the 1/10 of the data left out to estimate the prediction error of the model. Steps two through six are performed ten times with the same parameters settings, each time using a different 9/10 of the data for training and 1/10 of the data for testing. The loci that are consistently present in the GPNN models are selected as the functional loci and are used as input to a final GPNN evolutionary process to estimate the classification and prediction error of the GPNN model.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2553772&req=5

Figure 2: Overview of the GPNN method (adapted from Ritchie et al. 2003). First, GPNN has a set of parameters to be initialized before beginning the evolution of NN models. Second, the data are divided into 10 equal parts for 10-fold cross-validation. Third, training begins by generating an initial population of random solutions. Fourth, each NN is evaluated on the training set and its fitness (classification error) recorded. Fifth, the best solutions are selected for crossover and reproduction using a fitness-proportionate selection technique. The new generation begins the cycle again. This continues until a stopping criterion (classification error of zero or limit on the number of generations) is met. At the end of the GPNN evolution, the overall best solution is selected as the optimal NN. Sixth, this best GPNN model is tested on the 1/10 of the data left out to estimate the prediction error of the model. Steps two through six are performed ten times with the same parameters settings, each time using a different 9/10 of the data for training and 1/10 of the data for testing. The loci that are consistently present in the GPNN models are selected as the functional loci and are used as input to a final GPNN evolutionary process to estimate the classification and prediction error of the GPNN model.

Mentions: An overview of the GPNN method is shown in Figure 2. Training the GPNN begins by generating an initial random population of solutions. Each solution is a binary expression tree representation of a NN, similar to that shown in Figure 3. The GP then evaluates each NN. The best solutions are selected for crossover and reproduction using a fitness-proportionate selection technique. A predefined proportion of the best solutions will be directly copied (reproduced) into the new generation. Another proportion of the solutions will be used for crossover with other best solutions. Crossover must take place such that the rules of network construction still apply. Next, the new generation, which is equal in size to the original population, begins the cycle again. This continues until some criterion is met at which point the GPNN stops. This criterion is either a classification error of zero or the maximum number of generations has been reached. In addition, a "best-so-far" solution is chosen after each generation. At the end of the GP run, the one "best-so-far" solution is used as the solution to the problem.


Neural networks for genetic epidemiology: past, present, and future.

Motsinger-Reif AA, Ritchie MD - BioData Min (2008)

Overview of the GPNN method (adapted from Ritchie et al. 2003). First, GPNN has a set of parameters to be initialized before beginning the evolution of NN models. Second, the data are divided into 10 equal parts for 10-fold cross-validation. Third, training begins by generating an initial population of random solutions. Fourth, each NN is evaluated on the training set and its fitness (classification error) recorded. Fifth, the best solutions are selected for crossover and reproduction using a fitness-proportionate selection technique. The new generation begins the cycle again. This continues until a stopping criterion (classification error of zero or limit on the number of generations) is met. At the end of the GPNN evolution, the overall best solution is selected as the optimal NN. Sixth, this best GPNN model is tested on the 1/10 of the data left out to estimate the prediction error of the model. Steps two through six are performed ten times with the same parameters settings, each time using a different 9/10 of the data for training and 1/10 of the data for testing. The loci that are consistently present in the GPNN models are selected as the functional loci and are used as input to a final GPNN evolutionary process to estimate the classification and prediction error of the GPNN model.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2553772&req=5

Figure 2: Overview of the GPNN method (adapted from Ritchie et al. 2003). First, GPNN has a set of parameters to be initialized before beginning the evolution of NN models. Second, the data are divided into 10 equal parts for 10-fold cross-validation. Third, training begins by generating an initial population of random solutions. Fourth, each NN is evaluated on the training set and its fitness (classification error) recorded. Fifth, the best solutions are selected for crossover and reproduction using a fitness-proportionate selection technique. The new generation begins the cycle again. This continues until a stopping criterion (classification error of zero or limit on the number of generations) is met. At the end of the GPNN evolution, the overall best solution is selected as the optimal NN. Sixth, this best GPNN model is tested on the 1/10 of the data left out to estimate the prediction error of the model. Steps two through six are performed ten times with the same parameters settings, each time using a different 9/10 of the data for training and 1/10 of the data for testing. The loci that are consistently present in the GPNN models are selected as the functional loci and are used as input to a final GPNN evolutionary process to estimate the classification and prediction error of the GPNN model.
Mentions: An overview of the GPNN method is shown in Figure 2. Training the GPNN begins by generating an initial random population of solutions. Each solution is a binary expression tree representation of a NN, similar to that shown in Figure 3. The GP then evaluates each NN. The best solutions are selected for crossover and reproduction using a fitness-proportionate selection technique. A predefined proportion of the best solutions will be directly copied (reproduced) into the new generation. Another proportion of the solutions will be used for crossover with other best solutions. Crossover must take place such that the rules of network construction still apply. Next, the new generation, which is equal in size to the original population, begins the cycle again. This continues until some criterion is met at which point the GPNN stops. This criterion is either a classification error of zero or the maximum number of generations has been reached. In addition, a "best-so-far" solution is chosen after each generation. At the end of the GP run, the one "best-so-far" solution is used as the solution to the problem.

Bottom Line: The application of NN for statistical genetics studies is an active area of research.We discuss both the successes of these initial NN applications, and the questions that arose during the previous studies.Finally, we introduce evolutionary computing strategies, Genetic Programming Neural Networks (GPNN) and Grammatical Evolution Neural Networks (GENN), for using NN in association studies of complex human diseases that address some of the caveats illuminated by previous work.

View Article: PubMed Central - HTML - PubMed

Affiliation: Bioinformatics Research Center, Department of Statistics, North Carolina State University, Raleigh, NC, USA. motsinger@stat.ncsu.edu

ABSTRACT
During the past two decades, the field of human genetics has experienced an information explosion. The completion of the human genome project and the development of high throughput SNP technologies have created a wealth of data; however, the analysis and interpretation of these data have created a research bottleneck. While technology facilitates the measurement of hundreds or thousands of genes, statistical and computational methodologies are lacking for the analysis of these data. New statistical methods and variable selection strategies must be explored for identifying disease susceptibility genes for common, complex diseases. Neural networks (NN) are a class of pattern recognition methods that have been successfully implemented for data mining and prediction in a variety of fields. The application of NN for statistical genetics studies is an active area of research. Neural networks have been applied in both linkage and association analysis for the identification of disease susceptibility genes.In the current review, we consider how NN have been used for both linkage and association analyses in genetic epidemiology. We discuss both the successes of these initial NN applications, and the questions that arose during the previous studies. Finally, we introduce evolutionary computing strategies, Genetic Programming Neural Networks (GPNN) and Grammatical Evolution Neural Networks (GENN), for using NN in association studies of complex human diseases that address some of the caveats illuminated by previous work.

No MeSH data available.