Inferring population structure and relationship using minimal independent evolutionary markers in Y-chromosome: a hybrid approach of recursive feature selection for hierarchical clustering.
Bottom Line: An analysis of 105 world-wide populations reflected that 15 independent variations/markers were optimal in defining population structure parameters, such as FST, molecular variance and correlation-based relationship.A subsequent addition of randomly selected markers had a negligible effect (close to zero, i.e. 1 × 10(-3)) on these parameters.The study proves efficient in tracing complex population structures and deriving relationships among world-wide populations in a cost-effective and expedient manner.
Affiliation: National Centre of Applied Human Genetics, School of Life Sciences, Jawaharlal Nehru University, New Delhi 110067, India.Show MeSH
Mentions: On the basis of PCC-based variable ranking, we observed that few markers, considered as independent signatures for diversification of male populations world-wide were highly correlated. However, we could not have merged two such markers providing independent signature for Y-chromosomal haplogroups, knowing the fact that these markers are located in non-recombining Y-chromosome which itself is haploid in nature representing a haplotype block and thereby, forms the basis for close correlation. This situation is unlike autosomal SNPs where both conditions, i.e. haplotype block-dependent and haplotype block-independent are considerable. Therefore, we embedded feature selection with agglomerative (bottom up) hierarchical clustering of haplogroups on the basis of the prior knowledge of phylogeny of Y-chromosomal haplogroups to minimize the redundancy generated by markers representing lower nodes in Y-chromosomal hierarchy and depending on the higher nodes of their respective clades (Figures 1 and 3). With this approach, sub-clades were clustered into their respective major clades and again pruned on the basis of PCC. The above step was repeated till we reached the most ancestral nodes (12 markers) of Y-chromosome phylogeny (Supplementary Table S1a–i) and the procedure named as RFSHC.
Affiliation: National Centre of Applied Human Genetics, School of Life Sciences, Jawaharlal Nehru University, New Delhi 110067, India.