Limits...
Building a model: developing genomic resources for common milkweed (Asclepias syriaca) with low coverage genome sequencing.

Straub SC, Fishbein M, Livshultz T, Foster Z, Parks M, Weitemier K, Cronn RC, Liston A - BMC Genomics (2011)

Bottom Line: The results highlight the promise of next generation sequencing for development of genomic resources for any organism.Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species and its relatives.This study represents a first step in the development of a community resource for further study of plant-insect co-evolution, anti-herbivore defense, floral developmental genetics, reproductive biology, chemical evolution, population genetics, and comparative genomics using milkweeds, and A. syriaca in particular, as ecological and evolutionary models.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Botany and Plant Pathology, Oregon State University, Corvallis, Oregon 97331, USA. straubs@science.oregonstate.edu

ABSTRACT

Background: Milkweeds (Asclepias L.) have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L.) could be useful in characterizing the genome of a plant without prior genomic information and for development of genomic resources as a step toward further developing A. syriaca as a model in ecology and evolution.

Results: A 0.5× genome of A. syriaca was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: accD, clpP, and ycf1. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp) and 5S rDNA (120 bp) sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp), with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/copia-like retroelements are the most common repeat type in the milkweed genome. At least one A. syriaca microread hit 88% of Catharanthus roseus (Apocynaceae) unigenes (median coverage of 0.29×) and 66% of single copy orthologs (COSII) in asterids (median coverage of 0.14×). From this partial characterization of the A. syriaca genome, markers for population genetics (microsatellites) and phylogenetics (low-copy nuclear genes) studies were developed.

Conclusions: The results highlight the promise of next generation sequencing for development of genomic resources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species and its relatives. This study represents a first step in the development of a community resource for further study of plant-insect co-evolution, anti-herbivore defense, floral developmental genetics, reproductive biology, chemical evolution, population genetics, and comparative genomics using milkweeds, and A. syriaca in particular, as ecological and evolutionary models.

Show MeSH

Related in: MedlinePlus

Map of the chloroplast genome of Asclepias syriaca. The thick black lines indicate the locations of the inverted repeats (IR). The thin black lines indicate the locations of the large single copy (LSC) and small single copy (SSC) regions. Transcription is clockwise for genes on the outside of the circle and counterclockwise for genes on the inside of the circle. Asterisks denote the locations of unresolved sequence due to polynucleotide stretches.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3116503&req=5

Figure 1: Map of the chloroplast genome of Asclepias syriaca. The thick black lines indicate the locations of the inverted repeats (IR). The thin black lines indicate the locations of the large single copy (LSC) and small single copy (SSC) regions. Transcription is clockwise for genes on the outside of the circle and counterclockwise for genes on the inside of the circle. Asterisks denote the locations of unresolved sequence due to polynucleotide stretches.

Mentions: The chloroplast genome of A. syriaca is 158,598 bp [GenBank:JF433943], excluding two small, unresolved regions that were not able to be assembled or Sanger sequenced due to polynucleotide stretches (rps8-rpl14 intergenic spacer and ψycf1), and has an inverted repeat (IR) of 25,401 bp (Figure 1). The initial assembly produced using the alignreads pipeline and the oleander reference contained 51 contigs with a median read depth of 246× and N50 of 4,683 bp. The longest contig was 14,884 bp. The final assembly using the finished A. syriaca sequence as a reference contained 22 contigs, had an N50 of 9,030 bp, and longest contig of 28,186 bp. The use of chloroplast contigs from 80 bp reads from other A. syriaca individuals (S. Straub and A. Liston, unpublished data) in combination with Sanger sequencing of select regions in the 0.5× genome individual and these other individuals, resulted in the addition of 6,622 bp and deletion of 1,402 bp of sequence. That large insertions and deletions were found relative to the original assembly was expected due to the limited power of reference guided assembly algorithms to reconstruct these differences. Only 38 bp of sequence from the original reference guided assembly were determined to be incorrect, producing substitution errors in the genome sequence. The sum total of these changes resulted in the alteration of approximately 5% of the total sequence length, indicating that 95% of the chloroplast genome of A. syriaca was assembled at 0.5× average genome coverage with reads of only 40 bp and an oleander reference with 85% sequence identity (considering only one copy of the inverted repeat). Although the original assembly was very good overall, comparison of the final assembly with the finished reference sequence highlighted the limitations of using 40 bp vs. 80 bp reads for sequence assembly. Even with a correct reference sequence, some regions were still not assembled properly (e.g., accD, ycf1) and some assembly mistakes from the original assembly were recreated (e.g., an erroneous 5 bp deletion in the middle of rbcL causing a frameshift).


Building a model: developing genomic resources for common milkweed (Asclepias syriaca) with low coverage genome sequencing.

Straub SC, Fishbein M, Livshultz T, Foster Z, Parks M, Weitemier K, Cronn RC, Liston A - BMC Genomics (2011)

Map of the chloroplast genome of Asclepias syriaca. The thick black lines indicate the locations of the inverted repeats (IR). The thin black lines indicate the locations of the large single copy (LSC) and small single copy (SSC) regions. Transcription is clockwise for genes on the outside of the circle and counterclockwise for genes on the inside of the circle. Asterisks denote the locations of unresolved sequence due to polynucleotide stretches.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3116503&req=5

Figure 1: Map of the chloroplast genome of Asclepias syriaca. The thick black lines indicate the locations of the inverted repeats (IR). The thin black lines indicate the locations of the large single copy (LSC) and small single copy (SSC) regions. Transcription is clockwise for genes on the outside of the circle and counterclockwise for genes on the inside of the circle. Asterisks denote the locations of unresolved sequence due to polynucleotide stretches.
Mentions: The chloroplast genome of A. syriaca is 158,598 bp [GenBank:JF433943], excluding two small, unresolved regions that were not able to be assembled or Sanger sequenced due to polynucleotide stretches (rps8-rpl14 intergenic spacer and ψycf1), and has an inverted repeat (IR) of 25,401 bp (Figure 1). The initial assembly produced using the alignreads pipeline and the oleander reference contained 51 contigs with a median read depth of 246× and N50 of 4,683 bp. The longest contig was 14,884 bp. The final assembly using the finished A. syriaca sequence as a reference contained 22 contigs, had an N50 of 9,030 bp, and longest contig of 28,186 bp. The use of chloroplast contigs from 80 bp reads from other A. syriaca individuals (S. Straub and A. Liston, unpublished data) in combination with Sanger sequencing of select regions in the 0.5× genome individual and these other individuals, resulted in the addition of 6,622 bp and deletion of 1,402 bp of sequence. That large insertions and deletions were found relative to the original assembly was expected due to the limited power of reference guided assembly algorithms to reconstruct these differences. Only 38 bp of sequence from the original reference guided assembly were determined to be incorrect, producing substitution errors in the genome sequence. The sum total of these changes resulted in the alteration of approximately 5% of the total sequence length, indicating that 95% of the chloroplast genome of A. syriaca was assembled at 0.5× average genome coverage with reads of only 40 bp and an oleander reference with 85% sequence identity (considering only one copy of the inverted repeat). Although the original assembly was very good overall, comparison of the final assembly with the finished reference sequence highlighted the limitations of using 40 bp vs. 80 bp reads for sequence assembly. Even with a correct reference sequence, some regions were still not assembled properly (e.g., accD, ycf1) and some assembly mistakes from the original assembly were recreated (e.g., an erroneous 5 bp deletion in the middle of rbcL causing a frameshift).

Bottom Line: The results highlight the promise of next generation sequencing for development of genomic resources for any organism.Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species and its relatives.This study represents a first step in the development of a community resource for further study of plant-insect co-evolution, anti-herbivore defense, floral developmental genetics, reproductive biology, chemical evolution, population genetics, and comparative genomics using milkweeds, and A. syriaca in particular, as ecological and evolutionary models.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Botany and Plant Pathology, Oregon State University, Corvallis, Oregon 97331, USA. straubs@science.oregonstate.edu

ABSTRACT

Background: Milkweeds (Asclepias L.) have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L.) could be useful in characterizing the genome of a plant without prior genomic information and for development of genomic resources as a step toward further developing A. syriaca as a model in ecology and evolution.

Results: A 0.5× genome of A. syriaca was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: accD, clpP, and ycf1. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp) and 5S rDNA (120 bp) sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp), with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/copia-like retroelements are the most common repeat type in the milkweed genome. At least one A. syriaca microread hit 88% of Catharanthus roseus (Apocynaceae) unigenes (median coverage of 0.29×) and 66% of single copy orthologs (COSII) in asterids (median coverage of 0.14×). From this partial characterization of the A. syriaca genome, markers for population genetics (microsatellites) and phylogenetics (low-copy nuclear genes) studies were developed.

Conclusions: The results highlight the promise of next generation sequencing for development of genomic resources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species and its relatives. This study represents a first step in the development of a community resource for further study of plant-insect co-evolution, anti-herbivore defense, floral developmental genetics, reproductive biology, chemical evolution, population genetics, and comparative genomics using milkweeds, and A. syriaca in particular, as ecological and evolutionary models.

Show MeSH
Related in: MedlinePlus