Limits...
Sequencing strategies and characterization of 721 vervet monkey genomes for future genetic analyses of medically relevant traits.

Huang YS, Ramensky V, Service SK, Jasinska AJ, Jung Y, Choi OW, Cantor RM, Juretic N, Wasserscheid J, Kaplan JR, Jorgensen MJ, Dyer TD, Dewar K, Blangero J, Wilson RK, Warren W, Weinstock GM, Freimer NB - BMC Biol. (2015)

Bottom Line: From high-depth WGS data we identified more than 4 million polymorphic unequivocal segregating sites; by pruning these SNPs based on heterozygosity, quality control filters, and the degree of linkage disequilibrium (LD) between SNPs, we constructed genome-wide panels suitable for genetic association (about 500,000 SNPs) and linkage analysis (about 150,000 SNPs).To further enhance the utility of these resources for linkage analysis, we used a further pruned subset of the linkage panel to generate multipoint identity by descent matrices.The genetic and phenotypic resources now available for the VRC and other Caribbean-origin vervets enable their use for genetic investigation of traits relevant to human diseases.

View Article: PubMed Central - PubMed

Affiliation: Center for Neurobehavioral Genetics, University of California Los Angeles, Los Angeles, CA, 90095, USA.

ABSTRACT

Background: We report here the first genome-wide high-resolution polymorphism resource for non-human primate (NHP) association and linkage studies, constructed for the Caribbean-origin vervet monkey, or African green monkey (Chlorocebus aethiops sabaeus), one of the most widely used NHPs in biomedical research. We generated this resource by whole genome sequencing (WGS) of monkeys from the Vervet Research Colony (VRC), an NIH-supported research resource for which extensive phenotypic data are available.

Results: We identified genome-wide single nucleotide polymorphisms (SNPs) by WGS of 721 members of an extended pedigree from the VRC. From high-depth WGS data we identified more than 4 million polymorphic unequivocal segregating sites; by pruning these SNPs based on heterozygosity, quality control filters, and the degree of linkage disequilibrium (LD) between SNPs, we constructed genome-wide panels suitable for genetic association (about 500,000 SNPs) and linkage analysis (about 150,000 SNPs). To further enhance the utility of these resources for linkage analysis, we used a further pruned subset of the linkage panel to generate multipoint identity by descent matrices.

Conclusions: The genetic and phenotypic resources now available for the VRC and other Caribbean-origin vervets enable their use for genetic investigation of traits relevant to human diseases.

No MeSH data available.


Using the variant data from the WGS of the trio shown on the left, we evaluated three different down-sampling schemes, drawn on the right, to determine a pedigree-wide strategy for selecting monkeys for medium (4X) or low (1X) sequencing coverage. a The frequency of Mendelian errors in a trio increases in all three down-sampling experiments compared to the original data; however, the increase in error rate is greatest when both parents are low coverage and the child is medium coverage. b The percentage of concordant genotype calls between original data and down-sampled data is lowest when both parents are low coverage and the child is medium coverage. The percentages shown for both the rate of Mendelian inconsistency and for genotype concordance represent averages over three down-sampling experiments
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4494155&req=5

Fig1: Using the variant data from the WGS of the trio shown on the left, we evaluated three different down-sampling schemes, drawn on the right, to determine a pedigree-wide strategy for selecting monkeys for medium (4X) or low (1X) sequencing coverage. a The frequency of Mendelian errors in a trio increases in all three down-sampling experiments compared to the original data; however, the increase in error rate is greatest when both parents are low coverage and the child is medium coverage. b The percentage of concordant genotype calls between original data and down-sampled data is lowest when both parents are low coverage and the child is medium coverage. The percentages shown for both the rate of Mendelian inconsistency and for genotype concordance represent averages over three down-sampling experiments

Mentions: Down-sampling analysis showed that sequencing both parents at 4× (and their offspring at 1×) resulted in a much lower rate of Mendelian inconsistencies and a higher degree of genotype concordance with non down-sampled genotypes (in parents) than a strategy in which both parents are sequenced at 1X and their offspring at 4× (Fig. 1). Importantly, the intermediate strategy (sequencing one parent at 4X, one parent at 1X, and the child at 1×) has a similar Mendelian inconsistency rate and genotype concordance (for all three trio members) to the strategy in which both parents in the trio are sequenced at 4×, suggesting the possibility of achieving further gains in cost effective recovery of genotypes. We also observed, in genotype analysis of the initial 105 monkeys that lowering the sequencing coverage of any single monkey has very little impact on the accuracy of genotyping of pedigree members beyond the trios of which s/he is a member (data not shown).Fig. 1


Sequencing strategies and characterization of 721 vervet monkey genomes for future genetic analyses of medically relevant traits.

Huang YS, Ramensky V, Service SK, Jasinska AJ, Jung Y, Choi OW, Cantor RM, Juretic N, Wasserscheid J, Kaplan JR, Jorgensen MJ, Dyer TD, Dewar K, Blangero J, Wilson RK, Warren W, Weinstock GM, Freimer NB - BMC Biol. (2015)

Using the variant data from the WGS of the trio shown on the left, we evaluated three different down-sampling schemes, drawn on the right, to determine a pedigree-wide strategy for selecting monkeys for medium (4X) or low (1X) sequencing coverage. a The frequency of Mendelian errors in a trio increases in all three down-sampling experiments compared to the original data; however, the increase in error rate is greatest when both parents are low coverage and the child is medium coverage. b The percentage of concordant genotype calls between original data and down-sampled data is lowest when both parents are low coverage and the child is medium coverage. The percentages shown for both the rate of Mendelian inconsistency and for genotype concordance represent averages over three down-sampling experiments
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4494155&req=5

Fig1: Using the variant data from the WGS of the trio shown on the left, we evaluated three different down-sampling schemes, drawn on the right, to determine a pedigree-wide strategy for selecting monkeys for medium (4X) or low (1X) sequencing coverage. a The frequency of Mendelian errors in a trio increases in all three down-sampling experiments compared to the original data; however, the increase in error rate is greatest when both parents are low coverage and the child is medium coverage. b The percentage of concordant genotype calls between original data and down-sampled data is lowest when both parents are low coverage and the child is medium coverage. The percentages shown for both the rate of Mendelian inconsistency and for genotype concordance represent averages over three down-sampling experiments
Mentions: Down-sampling analysis showed that sequencing both parents at 4× (and their offspring at 1×) resulted in a much lower rate of Mendelian inconsistencies and a higher degree of genotype concordance with non down-sampled genotypes (in parents) than a strategy in which both parents are sequenced at 1X and their offspring at 4× (Fig. 1). Importantly, the intermediate strategy (sequencing one parent at 4X, one parent at 1X, and the child at 1×) has a similar Mendelian inconsistency rate and genotype concordance (for all three trio members) to the strategy in which both parents in the trio are sequenced at 4×, suggesting the possibility of achieving further gains in cost effective recovery of genotypes. We also observed, in genotype analysis of the initial 105 monkeys that lowering the sequencing coverage of any single monkey has very little impact on the accuracy of genotyping of pedigree members beyond the trios of which s/he is a member (data not shown).Fig. 1

Bottom Line: From high-depth WGS data we identified more than 4 million polymorphic unequivocal segregating sites; by pruning these SNPs based on heterozygosity, quality control filters, and the degree of linkage disequilibrium (LD) between SNPs, we constructed genome-wide panels suitable for genetic association (about 500,000 SNPs) and linkage analysis (about 150,000 SNPs).To further enhance the utility of these resources for linkage analysis, we used a further pruned subset of the linkage panel to generate multipoint identity by descent matrices.The genetic and phenotypic resources now available for the VRC and other Caribbean-origin vervets enable their use for genetic investigation of traits relevant to human diseases.

View Article: PubMed Central - PubMed

Affiliation: Center for Neurobehavioral Genetics, University of California Los Angeles, Los Angeles, CA, 90095, USA.

ABSTRACT

Background: We report here the first genome-wide high-resolution polymorphism resource for non-human primate (NHP) association and linkage studies, constructed for the Caribbean-origin vervet monkey, or African green monkey (Chlorocebus aethiops sabaeus), one of the most widely used NHPs in biomedical research. We generated this resource by whole genome sequencing (WGS) of monkeys from the Vervet Research Colony (VRC), an NIH-supported research resource for which extensive phenotypic data are available.

Results: We identified genome-wide single nucleotide polymorphisms (SNPs) by WGS of 721 members of an extended pedigree from the VRC. From high-depth WGS data we identified more than 4 million polymorphic unequivocal segregating sites; by pruning these SNPs based on heterozygosity, quality control filters, and the degree of linkage disequilibrium (LD) between SNPs, we constructed genome-wide panels suitable for genetic association (about 500,000 SNPs) and linkage analysis (about 150,000 SNPs). To further enhance the utility of these resources for linkage analysis, we used a further pruned subset of the linkage panel to generate multipoint identity by descent matrices.

Conclusions: The genetic and phenotypic resources now available for the VRC and other Caribbean-origin vervets enable their use for genetic investigation of traits relevant to human diseases.

No MeSH data available.