Limits...
Characterization of single-nucleotide variation in Indian-origin rhesus macaques (Macaca mulatta).

Fawcett GL, Raveendran M, Deiros DR, Chen D, Yu F, Harris RA, Ren Y, Muzny DM, Reid JG, Wheeler DA, Worley KC, Shelton SE, Kalin NH, Milosavljevic A, Gibbs R, Rogers J - BMC Genomics (2011)

Bottom Line: We then used three strategies to validate SNPs: comparison of potential SNPs found in the same individual using two different sequencing chemistries, and comparison of potential SNPs in different individuals identified with either the same or different sequencing chemistries.Resequencing of a small number of animals identified greater than 3 million SNPs.This provides a significant new information resource for rhesus macaques, an important research animal.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Molecular and Human Genetics, Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA.

ABSTRACT

Background: Rhesus macaques are the most widely utilized nonhuman primate model in biomedical research. Previous efforts have validated fewer than 900 single nucleotide polymorphisms (SNPs) in this species, which limits opportunities for genetic studies related to health and disease. Extensive information about SNPs and other genetic variation in rhesus macaques would facilitate valuable genetic analyses, as well as provide markers for genome-wide linkage analysis and the genetic management of captive breeding colonies.

Results: We used the available rhesus macaque draft genome sequence, new sequence data from unrelated individuals and existing published sequence data to create a genome-wide SNP resource for Indian-origin rhesus monkeys. The original reference animal and two additional Indian-origin individuals were resequenced to low coverage using SOLiDâ„¢ sequencing. We then used three strategies to validate SNPs: comparison of potential SNPs found in the same individual using two different sequencing chemistries, and comparison of potential SNPs in different individuals identified with either the same or different sequencing chemistries. Our approach validated approximately 3 million SNPs distributed across the genome. Preliminary analysis of SNP annotations suggests that a substantial number of these macaque SNPs may have functional effects. More than 700 non-synonymous SNPs were scored by Polyphen-2 as either possibly or probably damaging to protein function and these variants now constitute potential models for studying functional genetic variation relevant to human physiology and disease.

Conclusions: Resequencing of a small number of animals identified greater than 3 million SNPs. This provides a significant new information resource for rhesus macaques, an important research animal. The data also suggests that overall genetic variation is high in this species. We identified many potentially damaging non-synonymous coding SNPs, providing new opportunities to identify rhesus models for human disease.

Show MeSH
Low stringency vs. high stringency SNP call validation effectiveness. Validation efficiencies cannot be determined using existing validated SNP data sets, as only 777 SNPs are currently available for rhesus macaque in dbSNP, most of which are polymorphic between subspecies (Chinese to Indian) rather than within subspecies (ie: Indian to Indian). We observed improved validation efficiency using low-stringency SNP calls rather than high-stringency. Both high and low stringency SNP calls were obtained for the reference animal (17573) mate-pair sequence data. The percentage of total SNPs validated in the low-stringency SNP set was slightly less (33.8%) than that observed in the high stringency SNP set (45.7%). In absolute numbers, however, there were 1.8X more SNPs validated from the low stringency SNP calls compared to the high stringency SNP calls.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3141668&req=5

Figure 4: Low stringency vs. high stringency SNP call validation effectiveness. Validation efficiencies cannot be determined using existing validated SNP data sets, as only 777 SNPs are currently available for rhesus macaque in dbSNP, most of which are polymorphic between subspecies (Chinese to Indian) rather than within subspecies (ie: Indian to Indian). We observed improved validation efficiency using low-stringency SNP calls rather than high-stringency. Both high and low stringency SNP calls were obtained for the reference animal (17573) mate-pair sequence data. The percentage of total SNPs validated in the low-stringency SNP set was slightly less (33.8%) than that observed in the high stringency SNP set (45.7%). In absolute numbers, however, there were 1.8X more SNPs validated from the low stringency SNP calls compared to the high stringency SNP calls.

Mentions: In this study, we validated 3,038,166 SNPs and identified an additional 11,382,666 potential SNPs for the Indian-origin rhesus macaque. Both Chinese and Indian-origin rhesus macaques are used in biomedical research, but Indian-origin animals are more prevalent. Furthermore, the rhesus macaque reference genome was produced using an individual of Indian-origin [23]. At the time we began this study, very few SNPs or other variants were confirmed or validated in the rhesus genome. By taking advantage of the falling cost of next-gen resequencing and utilizing all of the existing sequence data available, we were able to validate a large genome-wide series of SNPs. Different sequencing technologies exhibit distinct probabilities of different sequencing errors that will increase false positive rates in calling novel SNPs [38]. SNPs validated in a single individual using multiple sequencing technologies avoid this problem, thus increasing confidence in the SNP calls common to the two data sets. In addition to inherent sequencing chemistry concerns, low coverage sequencing can suffer from read coverage limitations, creating false negative data because the SNP calling software will discard significant proportions of legitimate SNPs due to low coverage (Figure 4). For this reason, we chose to utilize reduced stringency SNP calls from corona_lite and AtlasSNP2. Our methodology minimizes both false positive and false negative SNP calls by using reduced-stringency SNP calling standards but requiring that each SNP location and both alleles be confirmed in data obtained from different animals and/or different chemistries. In addition, most of our comparisons utilized comparisons of SNP data sets that were called by different algorithms. Studies of human population variation from the 1000 Genomes Project [39] have shown that utilizing the intersection of different SNP calling methods to hone in on real SNP data reduces the false positive SNP error rate by 30-50%. This validated SNP list represents a substantial increase in the number of known rhesus SNPs, and therefore is an important resource for research involving this species. These data will facilitate the discovery of functional SNPs, the development of new disease models, and genetic linkage studies, as well as providing a valuable resource for colony management.


Characterization of single-nucleotide variation in Indian-origin rhesus macaques (Macaca mulatta).

Fawcett GL, Raveendran M, Deiros DR, Chen D, Yu F, Harris RA, Ren Y, Muzny DM, Reid JG, Wheeler DA, Worley KC, Shelton SE, Kalin NH, Milosavljevic A, Gibbs R, Rogers J - BMC Genomics (2011)

Low stringency vs. high stringency SNP call validation effectiveness. Validation efficiencies cannot be determined using existing validated SNP data sets, as only 777 SNPs are currently available for rhesus macaque in dbSNP, most of which are polymorphic between subspecies (Chinese to Indian) rather than within subspecies (ie: Indian to Indian). We observed improved validation efficiency using low-stringency SNP calls rather than high-stringency. Both high and low stringency SNP calls were obtained for the reference animal (17573) mate-pair sequence data. The percentage of total SNPs validated in the low-stringency SNP set was slightly less (33.8%) than that observed in the high stringency SNP set (45.7%). In absolute numbers, however, there were 1.8X more SNPs validated from the low stringency SNP calls compared to the high stringency SNP calls.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3141668&req=5

Figure 4: Low stringency vs. high stringency SNP call validation effectiveness. Validation efficiencies cannot be determined using existing validated SNP data sets, as only 777 SNPs are currently available for rhesus macaque in dbSNP, most of which are polymorphic between subspecies (Chinese to Indian) rather than within subspecies (ie: Indian to Indian). We observed improved validation efficiency using low-stringency SNP calls rather than high-stringency. Both high and low stringency SNP calls were obtained for the reference animal (17573) mate-pair sequence data. The percentage of total SNPs validated in the low-stringency SNP set was slightly less (33.8%) than that observed in the high stringency SNP set (45.7%). In absolute numbers, however, there were 1.8X more SNPs validated from the low stringency SNP calls compared to the high stringency SNP calls.
Mentions: In this study, we validated 3,038,166 SNPs and identified an additional 11,382,666 potential SNPs for the Indian-origin rhesus macaque. Both Chinese and Indian-origin rhesus macaques are used in biomedical research, but Indian-origin animals are more prevalent. Furthermore, the rhesus macaque reference genome was produced using an individual of Indian-origin [23]. At the time we began this study, very few SNPs or other variants were confirmed or validated in the rhesus genome. By taking advantage of the falling cost of next-gen resequencing and utilizing all of the existing sequence data available, we were able to validate a large genome-wide series of SNPs. Different sequencing technologies exhibit distinct probabilities of different sequencing errors that will increase false positive rates in calling novel SNPs [38]. SNPs validated in a single individual using multiple sequencing technologies avoid this problem, thus increasing confidence in the SNP calls common to the two data sets. In addition to inherent sequencing chemistry concerns, low coverage sequencing can suffer from read coverage limitations, creating false negative data because the SNP calling software will discard significant proportions of legitimate SNPs due to low coverage (Figure 4). For this reason, we chose to utilize reduced stringency SNP calls from corona_lite and AtlasSNP2. Our methodology minimizes both false positive and false negative SNP calls by using reduced-stringency SNP calling standards but requiring that each SNP location and both alleles be confirmed in data obtained from different animals and/or different chemistries. In addition, most of our comparisons utilized comparisons of SNP data sets that were called by different algorithms. Studies of human population variation from the 1000 Genomes Project [39] have shown that utilizing the intersection of different SNP calling methods to hone in on real SNP data reduces the false positive SNP error rate by 30-50%. This validated SNP list represents a substantial increase in the number of known rhesus SNPs, and therefore is an important resource for research involving this species. These data will facilitate the discovery of functional SNPs, the development of new disease models, and genetic linkage studies, as well as providing a valuable resource for colony management.

Bottom Line: We then used three strategies to validate SNPs: comparison of potential SNPs found in the same individual using two different sequencing chemistries, and comparison of potential SNPs in different individuals identified with either the same or different sequencing chemistries.Resequencing of a small number of animals identified greater than 3 million SNPs.This provides a significant new information resource for rhesus macaques, an important research animal.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Molecular and Human Genetics, Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA.

ABSTRACT

Background: Rhesus macaques are the most widely utilized nonhuman primate model in biomedical research. Previous efforts have validated fewer than 900 single nucleotide polymorphisms (SNPs) in this species, which limits opportunities for genetic studies related to health and disease. Extensive information about SNPs and other genetic variation in rhesus macaques would facilitate valuable genetic analyses, as well as provide markers for genome-wide linkage analysis and the genetic management of captive breeding colonies.

Results: We used the available rhesus macaque draft genome sequence, new sequence data from unrelated individuals and existing published sequence data to create a genome-wide SNP resource for Indian-origin rhesus monkeys. The original reference animal and two additional Indian-origin individuals were resequenced to low coverage using SOLiDâ„¢ sequencing. We then used three strategies to validate SNPs: comparison of potential SNPs found in the same individual using two different sequencing chemistries, and comparison of potential SNPs in different individuals identified with either the same or different sequencing chemistries. Our approach validated approximately 3 million SNPs distributed across the genome. Preliminary analysis of SNP annotations suggests that a substantial number of these macaque SNPs may have functional effects. More than 700 non-synonymous SNPs were scored by Polyphen-2 as either possibly or probably damaging to protein function and these variants now constitute potential models for studying functional genetic variation relevant to human physiology and disease.

Conclusions: Resequencing of a small number of animals identified greater than 3 million SNPs. This provides a significant new information resource for rhesus macaques, an important research animal. The data also suggests that overall genetic variation is high in this species. We identified many potentially damaging non-synonymous coding SNPs, providing new opportunities to identify rhesus models for human disease.

Show MeSH