Limits...
Computational analyses of ancient pathogen DNA from herbarium samples: challenges and prospects.

Yoshida K, Sasaki E, Kamoun S - Front Plant Sci (2015)

Bottom Line: DNA preservation in herbarium samples was unexpectedly good, raising the possibility of a whole new research area in plant and microbial genomics.However, the recovered DNA can be extremely fragmented resulting in specific challenges in reconstructing genome sequences.Here we review some of the challenges in computational analyses of ancient DNA from herbarium samples.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Plant Genetics, Graduate School of Agricultural Science, Kobe University Kobe, Japan ; The Sainsbury Laboratory, Norwich Research Park Norwich, UK.

ABSTRACT
The application of DNA sequencing technology to the study of ancient DNA has enabled the reconstruction of past epidemics from genomes of historically important plant-associated microbes. Recently, the genome sequences of the potato late blight pathogen Phytophthora infestans were analyzed from 19th century herbarium specimens. These herbarium samples originated from infected potatoes collected during and after the Irish potato famine. Herbaria have therefore great potential to help elucidate past epidemics of crops, date the emergence of pathogens, and inform about past pathogen population dynamics. DNA preservation in herbarium samples was unexpectedly good, raising the possibility of a whole new research area in plant and microbial genomics. However, the recovered DNA can be extremely fragmented resulting in specific challenges in reconstructing genome sequences. Here we review some of the challenges in computational analyses of ancient DNA from herbarium samples. We also applied the recently developed linkage method to haplotype reconstruction of diploid or polyploid genomes from fragmented ancient DNA.

No MeSH data available.


Related in: MedlinePlus

Haplotype reconstruction of genes in 19th century Phytophthora infestans that was preserved in herbarium specimens. (A) Distributions of single nucleotide polymorphisms (SNPs) per site and physical distance between adjacent SNPs on the genome are shown. Red boxes indicate values for the genes with haplotypes completely reconstructed by linkSNPs software. Blue boxes indicate values for the genes with haplotypes that were not fully recovered. Genes harboring only a single SNP were not included. (B) Pie charts showing the number of genes encoding effectors out of the tested genes and the genes with haplotypes that were completely reconstructed. Genes encoding effectors were significantly enriched in the reconstructed haplotypes. (C) An example of RXLR effector genes with haplotypes that were successfully rebuilt. PITG_04178 encodes a secreted protein of unknown function containing an RXLR motif. The top image shows the pileup of short reads over the region where PITG_04178 is located using IGV viewer. Colored boxes in the pileup indicate heterozygous SNPs, which were used for haplotype reconstruction. The middle image shows two haplotypes reconstructed by linkSNPs. Red letters indicate nucleotides in the coding region. The bottom image is the alignment of amino acid residues of PITG_04178 that correspond to the modern isolate T30-4 and two haplotypes of 19th century P. infestans. One of the haplotypes was consistent with that of T30-4. The other had four amino acid differences.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4585160&req=5

Figure 2: Haplotype reconstruction of genes in 19th century Phytophthora infestans that was preserved in herbarium specimens. (A) Distributions of single nucleotide polymorphisms (SNPs) per site and physical distance between adjacent SNPs on the genome are shown. Red boxes indicate values for the genes with haplotypes completely reconstructed by linkSNPs software. Blue boxes indicate values for the genes with haplotypes that were not fully recovered. Genes harboring only a single SNP were not included. (B) Pie charts showing the number of genes encoding effectors out of the tested genes and the genes with haplotypes that were completely reconstructed. Genes encoding effectors were significantly enriched in the reconstructed haplotypes. (C) An example of RXLR effector genes with haplotypes that were successfully rebuilt. PITG_04178 encodes a secreted protein of unknown function containing an RXLR motif. The top image shows the pileup of short reads over the region where PITG_04178 is located using IGV viewer. Colored boxes in the pileup indicate heterozygous SNPs, which were used for haplotype reconstruction. The middle image shows two haplotypes reconstructed by linkSNPs. Red letters indicate nucleotides in the coding region. The bottom image is the alignment of amino acid residues of PITG_04178 that correspond to the modern isolate T30-4 and two haplotypes of 19th century P. infestans. One of the haplotypes was consistent with that of T30-4. The other had four amino acid differences.

Mentions: To test whether this approach is applicable to haplotype reconstruction for individual genes of historic samples that are preserved in herbarium specimens, we used the short reads of P. infestans strain HERB-1 (Yoshida et al., 2013). The short reads were merged into single reads and mapped to the P. infestans T30-4 reference genome. We only used reads with a mapping quality over 30 (Yoshida et al., 2013). For haplotype construction, we employed linkSNPs software (see Supplemental Material for details), which was developed based on the linkage method (Sasaki et al., 2013). We selected 7,159 genes that showed 100% sequence read coverage over their coding regions and are located in GSRs or GDRs. The software called 56,469 SNPs (44,311 in GDRs and 12,158 in GSRs) and reconstructed 16,702 linkage groups of SNPs, of which 654 allowed deduction of complete haplotypes of the gene (Supplemental Tables S2 and S3). To characterize the differences between genes with complete haplotypes reconstructed and those that were incomplete, we compared the number of SNPs per site and the physical distance of adjacent SNP positions on the genome (Figure 2A). We used only genes that had more than one SNP to estimate linkages between SNPs. The successful and unsuccessful cases were similar in the distribution of SNPs per site with an average value of SNPs per site of 0.01 ± 0.01. However, the distance of adjacent SNP positions was significantly different. The average distance between adjacent SNP positions in the genes used to reconstruct the complete haplotype and the incomplete haplotype was 42.8 ± 35.8 bp and 259.5 ± 299.2 bp, respectively. Median length of the ancient DNA in the herbarium specimen was estimated to be 50∼86 bp. Haplotype reconstruction was understandably only applicable to genes with closely linked SNP positions.


Computational analyses of ancient pathogen DNA from herbarium samples: challenges and prospects.

Yoshida K, Sasaki E, Kamoun S - Front Plant Sci (2015)

Haplotype reconstruction of genes in 19th century Phytophthora infestans that was preserved in herbarium specimens. (A) Distributions of single nucleotide polymorphisms (SNPs) per site and physical distance between adjacent SNPs on the genome are shown. Red boxes indicate values for the genes with haplotypes completely reconstructed by linkSNPs software. Blue boxes indicate values for the genes with haplotypes that were not fully recovered. Genes harboring only a single SNP were not included. (B) Pie charts showing the number of genes encoding effectors out of the tested genes and the genes with haplotypes that were completely reconstructed. Genes encoding effectors were significantly enriched in the reconstructed haplotypes. (C) An example of RXLR effector genes with haplotypes that were successfully rebuilt. PITG_04178 encodes a secreted protein of unknown function containing an RXLR motif. The top image shows the pileup of short reads over the region where PITG_04178 is located using IGV viewer. Colored boxes in the pileup indicate heterozygous SNPs, which were used for haplotype reconstruction. The middle image shows two haplotypes reconstructed by linkSNPs. Red letters indicate nucleotides in the coding region. The bottom image is the alignment of amino acid residues of PITG_04178 that correspond to the modern isolate T30-4 and two haplotypes of 19th century P. infestans. One of the haplotypes was consistent with that of T30-4. The other had four amino acid differences.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4585160&req=5

Figure 2: Haplotype reconstruction of genes in 19th century Phytophthora infestans that was preserved in herbarium specimens. (A) Distributions of single nucleotide polymorphisms (SNPs) per site and physical distance between adjacent SNPs on the genome are shown. Red boxes indicate values for the genes with haplotypes completely reconstructed by linkSNPs software. Blue boxes indicate values for the genes with haplotypes that were not fully recovered. Genes harboring only a single SNP were not included. (B) Pie charts showing the number of genes encoding effectors out of the tested genes and the genes with haplotypes that were completely reconstructed. Genes encoding effectors were significantly enriched in the reconstructed haplotypes. (C) An example of RXLR effector genes with haplotypes that were successfully rebuilt. PITG_04178 encodes a secreted protein of unknown function containing an RXLR motif. The top image shows the pileup of short reads over the region where PITG_04178 is located using IGV viewer. Colored boxes in the pileup indicate heterozygous SNPs, which were used for haplotype reconstruction. The middle image shows two haplotypes reconstructed by linkSNPs. Red letters indicate nucleotides in the coding region. The bottom image is the alignment of amino acid residues of PITG_04178 that correspond to the modern isolate T30-4 and two haplotypes of 19th century P. infestans. One of the haplotypes was consistent with that of T30-4. The other had four amino acid differences.
Mentions: To test whether this approach is applicable to haplotype reconstruction for individual genes of historic samples that are preserved in herbarium specimens, we used the short reads of P. infestans strain HERB-1 (Yoshida et al., 2013). The short reads were merged into single reads and mapped to the P. infestans T30-4 reference genome. We only used reads with a mapping quality over 30 (Yoshida et al., 2013). For haplotype construction, we employed linkSNPs software (see Supplemental Material for details), which was developed based on the linkage method (Sasaki et al., 2013). We selected 7,159 genes that showed 100% sequence read coverage over their coding regions and are located in GSRs or GDRs. The software called 56,469 SNPs (44,311 in GDRs and 12,158 in GSRs) and reconstructed 16,702 linkage groups of SNPs, of which 654 allowed deduction of complete haplotypes of the gene (Supplemental Tables S2 and S3). To characterize the differences between genes with complete haplotypes reconstructed and those that were incomplete, we compared the number of SNPs per site and the physical distance of adjacent SNP positions on the genome (Figure 2A). We used only genes that had more than one SNP to estimate linkages between SNPs. The successful and unsuccessful cases were similar in the distribution of SNPs per site with an average value of SNPs per site of 0.01 ± 0.01. However, the distance of adjacent SNP positions was significantly different. The average distance between adjacent SNP positions in the genes used to reconstruct the complete haplotype and the incomplete haplotype was 42.8 ± 35.8 bp and 259.5 ± 299.2 bp, respectively. Median length of the ancient DNA in the herbarium specimen was estimated to be 50∼86 bp. Haplotype reconstruction was understandably only applicable to genes with closely linked SNP positions.

Bottom Line: DNA preservation in herbarium samples was unexpectedly good, raising the possibility of a whole new research area in plant and microbial genomics.However, the recovered DNA can be extremely fragmented resulting in specific challenges in reconstructing genome sequences.Here we review some of the challenges in computational analyses of ancient DNA from herbarium samples.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Plant Genetics, Graduate School of Agricultural Science, Kobe University Kobe, Japan ; The Sainsbury Laboratory, Norwich Research Park Norwich, UK.

ABSTRACT
The application of DNA sequencing technology to the study of ancient DNA has enabled the reconstruction of past epidemics from genomes of historically important plant-associated microbes. Recently, the genome sequences of the potato late blight pathogen Phytophthora infestans were analyzed from 19th century herbarium specimens. These herbarium samples originated from infected potatoes collected during and after the Irish potato famine. Herbaria have therefore great potential to help elucidate past epidemics of crops, date the emergence of pathogens, and inform about past pathogen population dynamics. DNA preservation in herbarium samples was unexpectedly good, raising the possibility of a whole new research area in plant and microbial genomics. However, the recovered DNA can be extremely fragmented resulting in specific challenges in reconstructing genome sequences. Here we review some of the challenges in computational analyses of ancient DNA from herbarium samples. We also applied the recently developed linkage method to haplotype reconstruction of diploid or polyploid genomes from fragmented ancient DNA.

No MeSH data available.


Related in: MedlinePlus