Limits...
SNP discovery in swine by reduced representation and high throughput pyrosequencing.

Wiedmann RT, Smith TP, Nonneman DJ - BMC Genet. (2008)

Bottom Line: We previously used a combination of short read (25 base pair) high-throughput sequencing and reduced genomic representation to discover > 60,000 single nucleotide polymorphisms (SNP) in cattle, but the current lack of complete genome sequence limits this approach in swine.Swine SNP were discovered in the present study using a reduced representation of 450 base pair (bp) porcine genomic fragments (approximately 4% of the swine genome) prepared from a pool of 26 animals relevant to current pork production, and a GS-FLX instrument producing 240 bp reads.By using a conservative approach, a robust group of SNPs were detected with greater confidence and relatively high MAF that should be suitable for genotyping in a wide variety of commercial populations.

View Article: PubMed Central - HTML - PubMed

Affiliation: USDA, ARS, US Meat Animal Research Center, State Spur 18D, NE 68933-0166, USA. ralph.wiedmann@ars.usda.gov

ABSTRACT

Background: Relatively little information is available for sequence variation in the pig. We previously used a combination of short read (25 base pair) high-throughput sequencing and reduced genomic representation to discover > 60,000 single nucleotide polymorphisms (SNP) in cattle, but the current lack of complete genome sequence limits this approach in swine. Longer-read pyrosequencing-based technologies have the potential to overcome this limitation by providing sufficient flanking sequence information for assay design. Swine SNP were discovered in the present study using a reduced representation of 450 base pair (bp) porcine genomic fragments (approximately 4% of the swine genome) prepared from a pool of 26 animals relevant to current pork production, and a GS-FLX instrument producing 240 bp reads.

Results: Approximately 5 million sequence reads were collected and assembled into contigs having an overall observed depth of 7.65-fold coverage. The approximate minor allele frequency was estimated from the number of observations of the alternate alleles. The average coverage at the SNPs was 12.6-fold. This approach identified 115,572 SNPs in 47,830 contigs. Comparison to partial swine genome draft sequence indicated 49,879 SNP (43%) and 22,045 contigs (46%) mapped to a position on a sequenced pig chromosome and the distribution was essentially random. A sample of 176 putative SNPs was examined and 168 (95.5%) were confirmed to have segregating alleles; the correlation of the observed minor allele frequency (MAF) to that predicted from the sequence data was 0.58.

Conclusion: The process was an efficient means to identify a large number of porcine SNP having high validation rate to be used in an ongoing international collaboration to produce a highly parallel genotyping assay for swine. By using a conservative approach, a robust group of SNPs were detected with greater confidence and relatively high MAF that should be suitable for genotyping in a wide variety of commercial populations.

Show MeSH

Related in: MedlinePlus

Distribution of the contig lengths showing that most of the contigs consist of reads from one end of the restriction fragments. About 25% of the contigs span the entire restriction fragment.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2612698&req=5

Figure 1: Distribution of the contig lengths showing that most of the contigs consist of reads from one end of the restriction fragments. About 25% of the contigs span the entire restriction fragment.

Mentions: The Newbler assembler, version 1.1.03, assembled 73% of the unmasked reads into 421,060 contigs, which were used to define the reference sequence for SNP discovery. Attempts to increase the number of reads in the assembly resulted in fatal software errors. Although 27% of the original reads were not used in the assembly, they were included in the mapping and SNP detection steps. Figure 1 shows the profile of contig lengths, indicating that most of the contigs were the length of a single read, but about 100,000 contigs were longer as reads from opposite directions overlapped in the middle to fully cover the library fragments. The total length of the 421,060 contigs was 110,823,689 bp indicating an average unmasked read coverage of 7.65×. Although N's were quite rare in the reads, 4.8% of the bases called by the assembler were "N", mostly concentrated in a small fraction of the contigs. Over 70% of the contigs were free of N's and 78% had less than 1% N content. The contig sequences are available in dbSTS [GenBank: BV729586 to BV999999, GF000001 to GF089508 and GF089703 to GF091743]. The SNPs are available in dbSNP [GenBank: ss107796326 to ss107911925].


SNP discovery in swine by reduced representation and high throughput pyrosequencing.

Wiedmann RT, Smith TP, Nonneman DJ - BMC Genet. (2008)

Distribution of the contig lengths showing that most of the contigs consist of reads from one end of the restriction fragments. About 25% of the contigs span the entire restriction fragment.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2612698&req=5

Figure 1: Distribution of the contig lengths showing that most of the contigs consist of reads from one end of the restriction fragments. About 25% of the contigs span the entire restriction fragment.
Mentions: The Newbler assembler, version 1.1.03, assembled 73% of the unmasked reads into 421,060 contigs, which were used to define the reference sequence for SNP discovery. Attempts to increase the number of reads in the assembly resulted in fatal software errors. Although 27% of the original reads were not used in the assembly, they were included in the mapping and SNP detection steps. Figure 1 shows the profile of contig lengths, indicating that most of the contigs were the length of a single read, but about 100,000 contigs were longer as reads from opposite directions overlapped in the middle to fully cover the library fragments. The total length of the 421,060 contigs was 110,823,689 bp indicating an average unmasked read coverage of 7.65×. Although N's were quite rare in the reads, 4.8% of the bases called by the assembler were "N", mostly concentrated in a small fraction of the contigs. Over 70% of the contigs were free of N's and 78% had less than 1% N content. The contig sequences are available in dbSTS [GenBank: BV729586 to BV999999, GF000001 to GF089508 and GF089703 to GF091743]. The SNPs are available in dbSNP [GenBank: ss107796326 to ss107911925].

Bottom Line: We previously used a combination of short read (25 base pair) high-throughput sequencing and reduced genomic representation to discover > 60,000 single nucleotide polymorphisms (SNP) in cattle, but the current lack of complete genome sequence limits this approach in swine.Swine SNP were discovered in the present study using a reduced representation of 450 base pair (bp) porcine genomic fragments (approximately 4% of the swine genome) prepared from a pool of 26 animals relevant to current pork production, and a GS-FLX instrument producing 240 bp reads.By using a conservative approach, a robust group of SNPs were detected with greater confidence and relatively high MAF that should be suitable for genotyping in a wide variety of commercial populations.

View Article: PubMed Central - HTML - PubMed

Affiliation: USDA, ARS, US Meat Animal Research Center, State Spur 18D, NE 68933-0166, USA. ralph.wiedmann@ars.usda.gov

ABSTRACT

Background: Relatively little information is available for sequence variation in the pig. We previously used a combination of short read (25 base pair) high-throughput sequencing and reduced genomic representation to discover > 60,000 single nucleotide polymorphisms (SNP) in cattle, but the current lack of complete genome sequence limits this approach in swine. Longer-read pyrosequencing-based technologies have the potential to overcome this limitation by providing sufficient flanking sequence information for assay design. Swine SNP were discovered in the present study using a reduced representation of 450 base pair (bp) porcine genomic fragments (approximately 4% of the swine genome) prepared from a pool of 26 animals relevant to current pork production, and a GS-FLX instrument producing 240 bp reads.

Results: Approximately 5 million sequence reads were collected and assembled into contigs having an overall observed depth of 7.65-fold coverage. The approximate minor allele frequency was estimated from the number of observations of the alternate alleles. The average coverage at the SNPs was 12.6-fold. This approach identified 115,572 SNPs in 47,830 contigs. Comparison to partial swine genome draft sequence indicated 49,879 SNP (43%) and 22,045 contigs (46%) mapped to a position on a sequenced pig chromosome and the distribution was essentially random. A sample of 176 putative SNPs was examined and 168 (95.5%) were confirmed to have segregating alleles; the correlation of the observed minor allele frequency (MAF) to that predicted from the sequence data was 0.58.

Conclusion: The process was an efficient means to identify a large number of porcine SNP having high validation rate to be used in an ongoing international collaboration to produce a highly parallel genotyping assay for swine. By using a conservative approach, a robust group of SNPs were detected with greater confidence and relatively high MAF that should be suitable for genotyping in a wide variety of commercial populations.

Show MeSH
Related in: MedlinePlus