Inferring population structure and relationship using minimal independent evolutionary markers in Y-chromosome: a hybrid approach of recursive feature selection for hierarchical clustering.
Bottom Line: An analysis of 105 world-wide populations reflected that 15 independent variations/markers were optimal in defining population structure parameters, such as FST, molecular variance and correlation-based relationship.A subsequent addition of randomly selected markers had a negligible effect (close to zero, i.e. 1 × 10(-3)) on these parameters.The study proves efficient in tracing complex population structures and deriving relationships among world-wide populations in a cost-effective and expedient manner.
Affiliation: National Centre of Applied Human Genetics, School of Life Sciences, Jawaharlal Nehru University, New Delhi 110067, India.Show MeSH
Mentions: In the current study, we used a correlation coefficient-based supervised feature selection method embedded with agglomerative hierarchical clustering based on prior knowledge of Y-chromosomal phylogeny. To validate our novel approach, we chose a model study based on real datasets of male-specific Y-chromosomal (MSY) variations generated in present and earlier studies. As per neutral theory of molecular evolution (7) and Kimura's step-wise mutation model (19), a major source of allelic diffusion in finite populations is fixation of neutral mutations by genetic drift, i.e. mutations occurring in steps are defined by state of variation occurred in the preceding generation. The same applies to Y-chromosome phylogeny as well, i.e. each haplogroup (combination of same or different haplotypes) is an outcome of one or more mutation event, which later on stabilizes under different evolutionary forces, such as migration, genetic drift, selection and admixture in a population or geographical region. Therefore, lower nodes in hierarchy appear in the background of already existing higher ones (Figure 1). In the background of the above fact, only few evolutionary markers which are most ancestral in their respective clades could be considered independent and rest are sequentially derived after the fixation and selection of ancestral ones (Figure 1).
Affiliation: National Centre of Applied Human Genetics, School of Life Sciences, Jawaharlal Nehru University, New Delhi 110067, India.