Limits...
AGAPE (Automated Genome Analysis PipelinE) for pan-genome analysis of Saccharomyces cerevisiae.

Song G, Dickins BJ, Demeter J, Engel S, Gallagher J, Choe K, Dunn B, Snyder M, Cherry JM - PLoS ONE (2015)

Bottom Line: To assign strain-specific functional annotations, we identified genes that were not present in the reference genome.The functional roles of novel genes not found in the reference genome and associated with strains or groups of strains appear to be consistent with anticipated adaptations in specific lineages.Our new tool set will enhance our understanding of genomic and functional evolution in S. cerevisiae, and will be available to the yeast genetics and molecular biology community.

View Article: PubMed Central - PubMed

Affiliation: Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America.

ABSTRACT
The characterization and public release of genome sequences from thousands of organisms is expanding the scope for genetic variation studies. However, understanding the phenotypic consequences of genetic variation remains a challenge in eukaryotes due to the complexity of the genotype-phenotype map. One approach to this is the intensive study of model systems for which diverse sources of information can be accumulated and integrated. Saccharomyces cerevisiae is an extensively studied model organism, with well-known protein functions and thoroughly curated phenotype data. To develop and expand the available resources linking genomic variation with function in yeast, we aim to model the pan-genome of S. cerevisiae. To initiate the yeast pan-genome, we newly sequenced or re-sequenced the genomes of 25 strains that are commonly used in the yeast research community using advanced sequencing technology at high quality. We also developed a pipeline for automated pan-genome analysis, which integrates the steps of assembly, annotation, and variation calling. To assign strain-specific functional annotations, we identified genes that were not present in the reference genome. We classified these according to their presence or absence across strains and characterized each group of genes with known functional and phenotypic features. The functional roles of novel genes not found in the reference genome and associated with strains or groups of strains appear to be consistent with anticipated adaptations in specific lineages. As more S. cerevisiae strain genomes are released, our analysis can be used to collate genome data and relate it to lineage-specific patterns of genome evolution. Our new tool set will enhance our understanding of genomic and functional evolution in S. cerevisiae, and will be available to the yeast genetics and molecular biology community.

Show MeSH

Related in: MedlinePlus

Variations in S. cerevisiae strains.(A) Number of non-reference ORFs in 25 S. cerevisiae strains. (B) Number of SNPs relative to the reference. According the number of SNPs, BY4742, X2180, BY4741, and FY1679 are essentially identical to the reference strain (S288C) and there are no non-reference ORFs in these strains. This supports the notion that these four strains are the same as S288C within experimental error. The variation patterns between non-reference ORFs and the number of SNPs show that strains that have more SNPs tend to have more non-reference ORFs, but there are some strains that have different patterns (e.g. K11 and YS9).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4363492&req=5

pone.0120671.g003: Variations in S. cerevisiae strains.(A) Number of non-reference ORFs in 25 S. cerevisiae strains. (B) Number of SNPs relative to the reference. According the number of SNPs, BY4742, X2180, BY4741, and FY1679 are essentially identical to the reference strain (S288C) and there are no non-reference ORFs in these strains. This supports the notion that these four strains are the same as S288C within experimental error. The variation patterns between non-reference ORFs and the number of SNPs show that strains that have more SNPs tend to have more non-reference ORFs, but there are some strains that have different patterns (e.g. K11 and YS9).

Mentions: As expected, we did not observe any non-reference ORFs among the seven strains (BY4741, BY4742, FY1679, SEY6210, JK9, W303, and X2180) known to be closely related to the S288C reference genome (Fig. 3A). Among the remaining 18 non-S288C strains, however, we found a total of 314 non-reference ORFs (Fig. 3A, S1 Table). We grouped the non-reference ORFs by aligning their protein sequences to each other using BLASTP. As a result, we identified 80 homologue groups of non-reference ORFs, including 16 unique ORFs that appear only in single strains (S1 Table). Eight ORFs out of the 80 non-reference groups were already annotated as non-reference features in SGD: MEL1, RTM1, MPR1, BIO6, TAT3, XDH1, MAL64, and KHR1 (Fig. 4). Previous studies had shown the presence of the BIO6 gene in saké strains and the TAT3 gene in RM11; our AGAPE results recapitulate these results, showing BIO6 occurring in the saké strain K11, and TAT3 in RM11.


AGAPE (Automated Genome Analysis PipelinE) for pan-genome analysis of Saccharomyces cerevisiae.

Song G, Dickins BJ, Demeter J, Engel S, Gallagher J, Choe K, Dunn B, Snyder M, Cherry JM - PLoS ONE (2015)

Variations in S. cerevisiae strains.(A) Number of non-reference ORFs in 25 S. cerevisiae strains. (B) Number of SNPs relative to the reference. According the number of SNPs, BY4742, X2180, BY4741, and FY1679 are essentially identical to the reference strain (S288C) and there are no non-reference ORFs in these strains. This supports the notion that these four strains are the same as S288C within experimental error. The variation patterns between non-reference ORFs and the number of SNPs show that strains that have more SNPs tend to have more non-reference ORFs, but there are some strains that have different patterns (e.g. K11 and YS9).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4363492&req=5

pone.0120671.g003: Variations in S. cerevisiae strains.(A) Number of non-reference ORFs in 25 S. cerevisiae strains. (B) Number of SNPs relative to the reference. According the number of SNPs, BY4742, X2180, BY4741, and FY1679 are essentially identical to the reference strain (S288C) and there are no non-reference ORFs in these strains. This supports the notion that these four strains are the same as S288C within experimental error. The variation patterns between non-reference ORFs and the number of SNPs show that strains that have more SNPs tend to have more non-reference ORFs, but there are some strains that have different patterns (e.g. K11 and YS9).
Mentions: As expected, we did not observe any non-reference ORFs among the seven strains (BY4741, BY4742, FY1679, SEY6210, JK9, W303, and X2180) known to be closely related to the S288C reference genome (Fig. 3A). Among the remaining 18 non-S288C strains, however, we found a total of 314 non-reference ORFs (Fig. 3A, S1 Table). We grouped the non-reference ORFs by aligning their protein sequences to each other using BLASTP. As a result, we identified 80 homologue groups of non-reference ORFs, including 16 unique ORFs that appear only in single strains (S1 Table). Eight ORFs out of the 80 non-reference groups were already annotated as non-reference features in SGD: MEL1, RTM1, MPR1, BIO6, TAT3, XDH1, MAL64, and KHR1 (Fig. 4). Previous studies had shown the presence of the BIO6 gene in saké strains and the TAT3 gene in RM11; our AGAPE results recapitulate these results, showing BIO6 occurring in the saké strain K11, and TAT3 in RM11.

Bottom Line: To assign strain-specific functional annotations, we identified genes that were not present in the reference genome.The functional roles of novel genes not found in the reference genome and associated with strains or groups of strains appear to be consistent with anticipated adaptations in specific lineages.Our new tool set will enhance our understanding of genomic and functional evolution in S. cerevisiae, and will be available to the yeast genetics and molecular biology community.

View Article: PubMed Central - PubMed

Affiliation: Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America.

ABSTRACT
The characterization and public release of genome sequences from thousands of organisms is expanding the scope for genetic variation studies. However, understanding the phenotypic consequences of genetic variation remains a challenge in eukaryotes due to the complexity of the genotype-phenotype map. One approach to this is the intensive study of model systems for which diverse sources of information can be accumulated and integrated. Saccharomyces cerevisiae is an extensively studied model organism, with well-known protein functions and thoroughly curated phenotype data. To develop and expand the available resources linking genomic variation with function in yeast, we aim to model the pan-genome of S. cerevisiae. To initiate the yeast pan-genome, we newly sequenced or re-sequenced the genomes of 25 strains that are commonly used in the yeast research community using advanced sequencing technology at high quality. We also developed a pipeline for automated pan-genome analysis, which integrates the steps of assembly, annotation, and variation calling. To assign strain-specific functional annotations, we identified genes that were not present in the reference genome. We classified these according to their presence or absence across strains and characterized each group of genes with known functional and phenotypic features. The functional roles of novel genes not found in the reference genome and associated with strains or groups of strains appear to be consistent with anticipated adaptations in specific lineages. As more S. cerevisiae strain genomes are released, our analysis can be used to collate genome data and relate it to lineage-specific patterns of genome evolution. Our new tool set will enhance our understanding of genomic and functional evolution in S. cerevisiae, and will be available to the yeast genetics and molecular biology community.

Show MeSH
Related in: MedlinePlus