Limits...
Machine Learning Data Imputation and Classification in a Multicohort Hypertension Clinical Study.

Seffens W, Evans C, Minority Health-GRID NetworkTaylor H - Bioinform Biol Insights (2016)

Bottom Line: We trained neural networks with phenotype data for missing-data imputation to increase the usable size of a clinical data set.Validity was established by showing performance effects using the expanded data set for the association of phenotype variables with case/control status of patients.Data mining classification tools were used to generate association rules.

View Article: PubMed Central - PubMed

Affiliation: Physiology Department, Morehouse School of Medicine, Atlanta, GA, USA.

ABSTRACT
Health-care initiatives are pushing the development and utilization of clinical data for medical discovery and translational research studies. Machine learning tools implemented for Big Data have been applied to detect patterns in complex diseases. This study focuses on hypertension and examines phenotype data across a major clinical study called Minority Health Genomics and Translational Research Repository Database composed of self-reported African American (AA) participants combined with related cohorts. Prior genome-wide association studies for hypertension in AAs presumed that an increase of disease burden in susceptible populations is due to rare variants. But genomic analysis of hypertension, even those designed to focus on rare variants, has yielded marginal genome-wide results over many studies. Machine learning and other nonparametric statistical methods have recently been shown to uncover relationships in complex phenotypes, genotypes, and clinical data. We trained neural networks with phenotype data for missing-data imputation to increase the usable size of a clinical data set. Validity was established by showing performance effects using the expanded data set for the association of phenotype variables with case/control status of patients. Data mining classification tools were used to generate association rules.

No MeSH data available.


Related in: MedlinePlus

ANN learning curve on height.Notes: Upper curves are the number of training set instances that are predicted incorrectly above a program set point, while the lower curves show learning by a decrease in SSE of the output neuron layer.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4862746&req=5

f1-bbi-suppl.3-2015-043: ANN learning curve on height.Notes: Upper curves are the number of training set instances that are predicted incorrectly above a program set point, while the lower curves show learning by a decrease in SSE of the output neuron layer.

Mentions: Mean height in the training set was 168.8 cm with a range of 142–200 cm (Table 1). Training the ANN for height also had very quick convergence, requiring only 40 epochs for a typical plateau-shaped learning curve (Fig. 1A). This rapidity is due to inclusion of weight and BMI in the training set variables. Since BMI is also a derived variable, there exists a defined relationship between the three variables: Height = [Weight/BMI]1/2 and hence the ANN training converges on a simple mathematical relationship discovered within those variables. But upon further learning, a secondary plateau forms at 2,000 epochs (Fig. 1B), again for 100% accuracy in the test sets at >1% significance. There was only one missing value for height in the data set. The BMI formula estimates that height as 190.2 cm, while the 2,000 epoch-trained ANN predicts 192 cm, or just 0.9% too high. An analogous study used ANNs to predict body weights of rabbits from body measurements33 and concluded that the ANN model is better than multivariate linear regression.


Machine Learning Data Imputation and Classification in a Multicohort Hypertension Clinical Study.

Seffens W, Evans C, Minority Health-GRID NetworkTaylor H - Bioinform Biol Insights (2016)

ANN learning curve on height.Notes: Upper curves are the number of training set instances that are predicted incorrectly above a program set point, while the lower curves show learning by a decrease in SSE of the output neuron layer.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4862746&req=5

f1-bbi-suppl.3-2015-043: ANN learning curve on height.Notes: Upper curves are the number of training set instances that are predicted incorrectly above a program set point, while the lower curves show learning by a decrease in SSE of the output neuron layer.
Mentions: Mean height in the training set was 168.8 cm with a range of 142–200 cm (Table 1). Training the ANN for height also had very quick convergence, requiring only 40 epochs for a typical plateau-shaped learning curve (Fig. 1A). This rapidity is due to inclusion of weight and BMI in the training set variables. Since BMI is also a derived variable, there exists a defined relationship between the three variables: Height = [Weight/BMI]1/2 and hence the ANN training converges on a simple mathematical relationship discovered within those variables. But upon further learning, a secondary plateau forms at 2,000 epochs (Fig. 1B), again for 100% accuracy in the test sets at >1% significance. There was only one missing value for height in the data set. The BMI formula estimates that height as 190.2 cm, while the 2,000 epoch-trained ANN predicts 192 cm, or just 0.9% too high. An analogous study used ANNs to predict body weights of rabbits from body measurements33 and concluded that the ANN model is better than multivariate linear regression.

Bottom Line: We trained neural networks with phenotype data for missing-data imputation to increase the usable size of a clinical data set.Validity was established by showing performance effects using the expanded data set for the association of phenotype variables with case/control status of patients.Data mining classification tools were used to generate association rules.

View Article: PubMed Central - PubMed

Affiliation: Physiology Department, Morehouse School of Medicine, Atlanta, GA, USA.

ABSTRACT
Health-care initiatives are pushing the development and utilization of clinical data for medical discovery and translational research studies. Machine learning tools implemented for Big Data have been applied to detect patterns in complex diseases. This study focuses on hypertension and examines phenotype data across a major clinical study called Minority Health Genomics and Translational Research Repository Database composed of self-reported African American (AA) participants combined with related cohorts. Prior genome-wide association studies for hypertension in AAs presumed that an increase of disease burden in susceptible populations is due to rare variants. But genomic analysis of hypertension, even those designed to focus on rare variants, has yielded marginal genome-wide results over many studies. Machine learning and other nonparametric statistical methods have recently been shown to uncover relationships in complex phenotypes, genotypes, and clinical data. We trained neural networks with phenotype data for missing-data imputation to increase the usable size of a clinical data set. Validity was established by showing performance effects using the expanded data set for the association of phenotype variables with case/control status of patients. Data mining classification tools were used to generate association rules.

No MeSH data available.


Related in: MedlinePlus