Limits...
Using Network Methodology to Infer Population Substructure.

Prokopenko D, Hecker J, Silverman E, Nöthen MM, Schmid M, Lange C, Loehlein Fier H - PLoS ONE (2015)

Bottom Line: We group the sequenced individuals into triads, which depict the relational structure, on the basis of a predefined pairwise similarity measure.The application to 1000 Genomes data suggests that the network approach provides a very fine resolution of the underlying ancestral population structure.Besides we show in simulations, that in the presence of discrete population structures, our developed approach maintains the type 1 error more precisely than existing approaches.

View Article: PubMed Central - PubMed

Affiliation: Institute of Genomic Mathematics, University of Bonn, Bonn, Germany; Institute of Human Genetics, University of Bonn, Bonn, Germany.

ABSTRACT
One of the main caveats of association studies is the possible affection by bias due to population stratification. Existing methods rely on model-based approaches like structure and ADMIXTURE or on principal component analysis like EIGENSTRAT. Here we provide a novel visualization technique and describe the problem of population substructure from a graph-theoretical point of view. We group the sequenced individuals into triads, which depict the relational structure, on the basis of a predefined pairwise similarity measure. We then merge the triads into a network and apply community detection algorithms in order to identify homogeneous subgroups or communities, which can further be incorporated as covariates into logistic regression. We apply our method to populations from different continents in the 1000 Genomes Project and evaluate the type 1 error based on the empirical p-values. The application to 1000 Genomes data suggests that the network approach provides a very fine resolution of the underlying ancestral population structure. Besides we show in simulations, that in the presence of discrete population structures, our developed approach maintains the type 1 error more precisely than existing approaches.

No MeSH data available.


Related in: MedlinePlus

5 European subpopulations.The polygons around the nodes represent the detected communities. The node colors represent the actual labels.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4476755&req=5

pone.0130708.g004: 5 European subpopulations.The polygons around the nodes represent the detected communities. The node colors represent the actual labels.

Mentions: We created scale-free plots of the constructed graphs in Figs 1–4. Every node represents an individual and the color of the node is related to the actual population label of this individual. For purposes of visualization we do not show the edges in the plots. In order to visualize the communities we constructed polygons around the nodes, which are assigned to this community. Every polygon represents 1 community. Full community assignment and the precision we described in Tables 3–6. One can clearly see an almost perfect separation of subpopulations in Africans and Americans. The separation in Asian subpopulations is slightly worse, there are some admixed communities, consisting of Han Chinese from Beijing and from the South. In the European subpopulations we see that the Finnish and Toscanian communities are very homogeneous. The 5 small heterogeneous communities mostly contain individuals from Utah and Great Britain, but this is expected,because the Utah residents from this dataset are known to have a high degree of shared ancestry with the British. For completeness we also included S1–S4 Tables which represent precision for unconnected components of the graphs. One can already determine a good separation for the subpopulations by looking only at the unconnected components.


Using Network Methodology to Infer Population Substructure.

Prokopenko D, Hecker J, Silverman E, Nöthen MM, Schmid M, Lange C, Loehlein Fier H - PLoS ONE (2015)

5 European subpopulations.The polygons around the nodes represent the detected communities. The node colors represent the actual labels.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4476755&req=5

pone.0130708.g004: 5 European subpopulations.The polygons around the nodes represent the detected communities. The node colors represent the actual labels.
Mentions: We created scale-free plots of the constructed graphs in Figs 1–4. Every node represents an individual and the color of the node is related to the actual population label of this individual. For purposes of visualization we do not show the edges in the plots. In order to visualize the communities we constructed polygons around the nodes, which are assigned to this community. Every polygon represents 1 community. Full community assignment and the precision we described in Tables 3–6. One can clearly see an almost perfect separation of subpopulations in Africans and Americans. The separation in Asian subpopulations is slightly worse, there are some admixed communities, consisting of Han Chinese from Beijing and from the South. In the European subpopulations we see that the Finnish and Toscanian communities are very homogeneous. The 5 small heterogeneous communities mostly contain individuals from Utah and Great Britain, but this is expected,because the Utah residents from this dataset are known to have a high degree of shared ancestry with the British. For completeness we also included S1–S4 Tables which represent precision for unconnected components of the graphs. One can already determine a good separation for the subpopulations by looking only at the unconnected components.

Bottom Line: We group the sequenced individuals into triads, which depict the relational structure, on the basis of a predefined pairwise similarity measure.The application to 1000 Genomes data suggests that the network approach provides a very fine resolution of the underlying ancestral population structure.Besides we show in simulations, that in the presence of discrete population structures, our developed approach maintains the type 1 error more precisely than existing approaches.

View Article: PubMed Central - PubMed

Affiliation: Institute of Genomic Mathematics, University of Bonn, Bonn, Germany; Institute of Human Genetics, University of Bonn, Bonn, Germany.

ABSTRACT
One of the main caveats of association studies is the possible affection by bias due to population stratification. Existing methods rely on model-based approaches like structure and ADMIXTURE or on principal component analysis like EIGENSTRAT. Here we provide a novel visualization technique and describe the problem of population substructure from a graph-theoretical point of view. We group the sequenced individuals into triads, which depict the relational structure, on the basis of a predefined pairwise similarity measure. We then merge the triads into a network and apply community detection algorithms in order to identify homogeneous subgroups or communities, which can further be incorporated as covariates into logistic regression. We apply our method to populations from different continents in the 1000 Genomes Project and evaluate the type 1 error based on the empirical p-values. The application to 1000 Genomes data suggests that the network approach provides a very fine resolution of the underlying ancestral population structure. Besides we show in simulations, that in the presence of discrete population structures, our developed approach maintains the type 1 error more precisely than existing approaches.

No MeSH data available.


Related in: MedlinePlus