Limits...
AGAPE (Automated Genome Analysis PipelinE) for pan-genome analysis of Saccharomyces cerevisiae.

Song G, Dickins BJ, Demeter J, Engel S, Gallagher J, Choe K, Dunn B, Snyder M, Cherry JM - PLoS ONE (2015)

Bottom Line: To assign strain-specific functional annotations, we identified genes that were not present in the reference genome.The functional roles of novel genes not found in the reference genome and associated with strains or groups of strains appear to be consistent with anticipated adaptations in specific lineages.Our new tool set will enhance our understanding of genomic and functional evolution in S. cerevisiae, and will be available to the yeast genetics and molecular biology community.

View Article: PubMed Central - PubMed

Affiliation: Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America.

ABSTRACT
The characterization and public release of genome sequences from thousands of organisms is expanding the scope for genetic variation studies. However, understanding the phenotypic consequences of genetic variation remains a challenge in eukaryotes due to the complexity of the genotype-phenotype map. One approach to this is the intensive study of model systems for which diverse sources of information can be accumulated and integrated. Saccharomyces cerevisiae is an extensively studied model organism, with well-known protein functions and thoroughly curated phenotype data. To develop and expand the available resources linking genomic variation with function in yeast, we aim to model the pan-genome of S. cerevisiae. To initiate the yeast pan-genome, we newly sequenced or re-sequenced the genomes of 25 strains that are commonly used in the yeast research community using advanced sequencing technology at high quality. We also developed a pipeline for automated pan-genome analysis, which integrates the steps of assembly, annotation, and variation calling. To assign strain-specific functional annotations, we identified genes that were not present in the reference genome. We classified these according to their presence or absence across strains and characterized each group of genes with known functional and phenotypic features. The functional roles of novel genes not found in the reference genome and associated with strains or groups of strains appear to be consistent with anticipated adaptations in specific lineages. As more S. cerevisiae strain genomes are released, our analysis can be used to collate genome data and relate it to lineage-specific patterns of genome evolution. Our new tool set will enhance our understanding of genomic and functional evolution in S. cerevisiae, and will be available to the yeast genetics and molecular biology community.

Show MeSH

Related in: MedlinePlus

Phylogenetic inferences and population structure of S. cerevisiae strains from variation.(A) A neighbor-joining tree based on non-reference ORFs among 18 S. cerevisiae strains. (B) A neighbor-joining tree based on SNPs relative to the reference among 25 S. cerevisiae strains. The origin of each strain is indicated by the color of the enclosing circle. Strains that originated from similar sources appear close to each other in both trees, but there are some differences (e.g. SK1, K11, and YJM339). (C) Population structure based on SNPs using the Genome diversity tool in Galaxy. Statistical scores were also computed by the Galaxy tool in order to choose the most appropriate number of clusters (K). In our case, “K = 2 or 3” showed the lowest cross-validation error scores among the K values tested (with scores of 0.90 and 0.95, respectively). Colors were generated automatically and are not congruent with colors used in A and B.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4363492&req=5

pone.0120671.g005: Phylogenetic inferences and population structure of S. cerevisiae strains from variation.(A) A neighbor-joining tree based on non-reference ORFs among 18 S. cerevisiae strains. (B) A neighbor-joining tree based on SNPs relative to the reference among 25 S. cerevisiae strains. The origin of each strain is indicated by the color of the enclosing circle. Strains that originated from similar sources appear close to each other in both trees, but there are some differences (e.g. SK1, K11, and YJM339). (C) Population structure based on SNPs using the Genome diversity tool in Galaxy. Statistical scores were also computed by the Galaxy tool in order to choose the most appropriate number of clusters (K). In our case, “K = 2 or 3” showed the lowest cross-validation error scores among the K values tested (with scores of 0.90 and 0.95, respectively). Colors were generated automatically and are not congruent with colors used in A and B.

Mentions: A binary matrix based on patterns of presence or absence of the non-reference ORF groups in the 18 “non-S288C” strains that contained non-reference ORFs was used to calculate distance and construct a tree of the 18 strains based on a neighbor-joining method. This tree displays the relationships among the 18 strains based only on non-reference features (Fig. 5A). We also generated a tree based on the genome-wide SNPs found in each strain (relative to the reference). This tree reflects genomic distance based on the divergence of each strain from the reference, within only reference-homologous regions (Fig. 5B).


AGAPE (Automated Genome Analysis PipelinE) for pan-genome analysis of Saccharomyces cerevisiae.

Song G, Dickins BJ, Demeter J, Engel S, Gallagher J, Choe K, Dunn B, Snyder M, Cherry JM - PLoS ONE (2015)

Phylogenetic inferences and population structure of S. cerevisiae strains from variation.(A) A neighbor-joining tree based on non-reference ORFs among 18 S. cerevisiae strains. (B) A neighbor-joining tree based on SNPs relative to the reference among 25 S. cerevisiae strains. The origin of each strain is indicated by the color of the enclosing circle. Strains that originated from similar sources appear close to each other in both trees, but there are some differences (e.g. SK1, K11, and YJM339). (C) Population structure based on SNPs using the Genome diversity tool in Galaxy. Statistical scores were also computed by the Galaxy tool in order to choose the most appropriate number of clusters (K). In our case, “K = 2 or 3” showed the lowest cross-validation error scores among the K values tested (with scores of 0.90 and 0.95, respectively). Colors were generated automatically and are not congruent with colors used in A and B.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4363492&req=5

pone.0120671.g005: Phylogenetic inferences and population structure of S. cerevisiae strains from variation.(A) A neighbor-joining tree based on non-reference ORFs among 18 S. cerevisiae strains. (B) A neighbor-joining tree based on SNPs relative to the reference among 25 S. cerevisiae strains. The origin of each strain is indicated by the color of the enclosing circle. Strains that originated from similar sources appear close to each other in both trees, but there are some differences (e.g. SK1, K11, and YJM339). (C) Population structure based on SNPs using the Genome diversity tool in Galaxy. Statistical scores were also computed by the Galaxy tool in order to choose the most appropriate number of clusters (K). In our case, “K = 2 or 3” showed the lowest cross-validation error scores among the K values tested (with scores of 0.90 and 0.95, respectively). Colors were generated automatically and are not congruent with colors used in A and B.
Mentions: A binary matrix based on patterns of presence or absence of the non-reference ORF groups in the 18 “non-S288C” strains that contained non-reference ORFs was used to calculate distance and construct a tree of the 18 strains based on a neighbor-joining method. This tree displays the relationships among the 18 strains based only on non-reference features (Fig. 5A). We also generated a tree based on the genome-wide SNPs found in each strain (relative to the reference). This tree reflects genomic distance based on the divergence of each strain from the reference, within only reference-homologous regions (Fig. 5B).

Bottom Line: To assign strain-specific functional annotations, we identified genes that were not present in the reference genome.The functional roles of novel genes not found in the reference genome and associated with strains or groups of strains appear to be consistent with anticipated adaptations in specific lineages.Our new tool set will enhance our understanding of genomic and functional evolution in S. cerevisiae, and will be available to the yeast genetics and molecular biology community.

View Article: PubMed Central - PubMed

Affiliation: Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America.

ABSTRACT
The characterization and public release of genome sequences from thousands of organisms is expanding the scope for genetic variation studies. However, understanding the phenotypic consequences of genetic variation remains a challenge in eukaryotes due to the complexity of the genotype-phenotype map. One approach to this is the intensive study of model systems for which diverse sources of information can be accumulated and integrated. Saccharomyces cerevisiae is an extensively studied model organism, with well-known protein functions and thoroughly curated phenotype data. To develop and expand the available resources linking genomic variation with function in yeast, we aim to model the pan-genome of S. cerevisiae. To initiate the yeast pan-genome, we newly sequenced or re-sequenced the genomes of 25 strains that are commonly used in the yeast research community using advanced sequencing technology at high quality. We also developed a pipeline for automated pan-genome analysis, which integrates the steps of assembly, annotation, and variation calling. To assign strain-specific functional annotations, we identified genes that were not present in the reference genome. We classified these according to their presence or absence across strains and characterized each group of genes with known functional and phenotypic features. The functional roles of novel genes not found in the reference genome and associated with strains or groups of strains appear to be consistent with anticipated adaptations in specific lineages. As more S. cerevisiae strain genomes are released, our analysis can be used to collate genome data and relate it to lineage-specific patterns of genome evolution. Our new tool set will enhance our understanding of genomic and functional evolution in S. cerevisiae, and will be available to the yeast genetics and molecular biology community.

Show MeSH
Related in: MedlinePlus