Limits...
A computational pipeline to discover highly phylogenetically informative genes in sequenced genomes: application to Saccharomyces cerevisiae natural strains.

Ramazzotti M, Berná L, Stefanini I, Cavalieri D - Nucleic Acids Res. (2012)

Bottom Line: Nevertheless, the disadvantageous cost-benefit ratio (the amount of details disclosed by NGS against the time-expensive and expertise-demanding data assembly process) still precludes the application of these techniques to the routinely assignment of yeast strains, making the selection of the most reliable molecular markers greatly desirable.We found 13 genes whose variability can be used to recapitulate the phylogeny obtained from genome-wide sequences.The same approach that we prove to be successful in yeasts can be generalized to any other population of individuals given the availability of high-quality genomic sequences and of a clear population structure to be targeted.

View Article: PubMed Central - PubMed

Affiliation: Department of Preclinical and Clinical Pharmacology, University of Florence, Viale G. Pieraccini 6, 50139 Firenze, Italy.

ABSTRACT
The quest for genes representing genetic relationships of strains or individuals within populations and their evolutionary history is acquiring a novel dimension of complexity with the advancement of next-generation sequencing (NGS) technologies. In fact, sequencing an entire genome uncovers genetic variation in coding and non-coding regions and offers the possibility of studying Saccharomyces cerevisiae populations at the strain level. Nevertheless, the disadvantageous cost-benefit ratio (the amount of details disclosed by NGS against the time-expensive and expertise-demanding data assembly process) still precludes the application of these techniques to the routinely assignment of yeast strains, making the selection of the most reliable molecular markers greatly desirable. In this work we propose an original computational approach to discover genes that can be used as a descriptor of the population structure. We found 13 genes whose variability can be used to recapitulate the phylogeny obtained from genome-wide sequences. The same approach that we prove to be successful in yeasts can be generalized to any other population of individuals given the availability of high-quality genomic sequences and of a clear population structure to be targeted.

Show MeSH

Related in: MedlinePlus

Phylogenetic relationships of the eight validation strains with respect to the 39 learning strains. (A) Full SNPs/indels phylogenomic tree of the 47 strains. (B) Recapitulated tree obtained using the combination of SNPs/indels of the three genes YBR163W, YJL051W and YPR152C. Validation strains are marked in bold. Color scheme is the same as in Figure 1.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3351171&req=5

gks005-F3: Phylogenetic relationships of the eight validation strains with respect to the 39 learning strains. (A) Full SNPs/indels phylogenomic tree of the 47 strains. (B) Recapitulated tree obtained using the combination of SNPs/indels of the three genes YBR163W, YJL051W and YPR152C. Validation strains are marked in bold. Color scheme is the same as in Figure 1.

Mentions: We firstly evaluated the genomic tree obtained using all the SNPs/indels of the 39 + 8 = 47 strains in order to map the phylogenetic relationships of the new genomes (Figure 3A). We found that seven out of the eight proposed genes (marked with asterisk in Table 2 and detailed in Validation Table V1 in Supplementary Data) satisfied this criterion. The complete phylogenomic tree including the 47 strains is presented in Figure 3A and the trees obtained with the eight candidate genes are shown in Validation Figures V1–V8 in Supplementary Data. In this new phylogeny, we can appreciate that four strains fall within specific Liti clusters. Respectively, the W303 strain falls within the cluster of lab strains, the YJM789 strain falls at the root of the wine strains while the RM11-1A strain and the EC1118 strain (a commercial wine starter) are positioned in the wine/European cluster. The UC5 strain (used to produce sake) falls in a position near the three strains described by Liti as the sake group. In this case the branch of the tree indicates a significant difference, in agreement with the heterogeneity of the so-called sake strains, as indicted also by previous results. Finally, the laboratory strain Sigma1278b clusters close to the root of the group of laboratory strains S288C and W303 as suggested by the fact that this strain derives from one of the crosses that led to the development of S288C. It is noteworthy that the genomes representing re-sequencings of strains already present in the original tree do not always cluster close to each corresponding genome, suggesting that some uncertainties exist in the sequences obtained from SGD.Figure 3.


A computational pipeline to discover highly phylogenetically informative genes in sequenced genomes: application to Saccharomyces cerevisiae natural strains.

Ramazzotti M, Berná L, Stefanini I, Cavalieri D - Nucleic Acids Res. (2012)

Phylogenetic relationships of the eight validation strains with respect to the 39 learning strains. (A) Full SNPs/indels phylogenomic tree of the 47 strains. (B) Recapitulated tree obtained using the combination of SNPs/indels of the three genes YBR163W, YJL051W and YPR152C. Validation strains are marked in bold. Color scheme is the same as in Figure 1.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3351171&req=5

gks005-F3: Phylogenetic relationships of the eight validation strains with respect to the 39 learning strains. (A) Full SNPs/indels phylogenomic tree of the 47 strains. (B) Recapitulated tree obtained using the combination of SNPs/indels of the three genes YBR163W, YJL051W and YPR152C. Validation strains are marked in bold. Color scheme is the same as in Figure 1.
Mentions: We firstly evaluated the genomic tree obtained using all the SNPs/indels of the 39 + 8 = 47 strains in order to map the phylogenetic relationships of the new genomes (Figure 3A). We found that seven out of the eight proposed genes (marked with asterisk in Table 2 and detailed in Validation Table V1 in Supplementary Data) satisfied this criterion. The complete phylogenomic tree including the 47 strains is presented in Figure 3A and the trees obtained with the eight candidate genes are shown in Validation Figures V1–V8 in Supplementary Data. In this new phylogeny, we can appreciate that four strains fall within specific Liti clusters. Respectively, the W303 strain falls within the cluster of lab strains, the YJM789 strain falls at the root of the wine strains while the RM11-1A strain and the EC1118 strain (a commercial wine starter) are positioned in the wine/European cluster. The UC5 strain (used to produce sake) falls in a position near the three strains described by Liti as the sake group. In this case the branch of the tree indicates a significant difference, in agreement with the heterogeneity of the so-called sake strains, as indicted also by previous results. Finally, the laboratory strain Sigma1278b clusters close to the root of the group of laboratory strains S288C and W303 as suggested by the fact that this strain derives from one of the crosses that led to the development of S288C. It is noteworthy that the genomes representing re-sequencings of strains already present in the original tree do not always cluster close to each corresponding genome, suggesting that some uncertainties exist in the sequences obtained from SGD.Figure 3.

Bottom Line: Nevertheless, the disadvantageous cost-benefit ratio (the amount of details disclosed by NGS against the time-expensive and expertise-demanding data assembly process) still precludes the application of these techniques to the routinely assignment of yeast strains, making the selection of the most reliable molecular markers greatly desirable.We found 13 genes whose variability can be used to recapitulate the phylogeny obtained from genome-wide sequences.The same approach that we prove to be successful in yeasts can be generalized to any other population of individuals given the availability of high-quality genomic sequences and of a clear population structure to be targeted.

View Article: PubMed Central - PubMed

Affiliation: Department of Preclinical and Clinical Pharmacology, University of Florence, Viale G. Pieraccini 6, 50139 Firenze, Italy.

ABSTRACT
The quest for genes representing genetic relationships of strains or individuals within populations and their evolutionary history is acquiring a novel dimension of complexity with the advancement of next-generation sequencing (NGS) technologies. In fact, sequencing an entire genome uncovers genetic variation in coding and non-coding regions and offers the possibility of studying Saccharomyces cerevisiae populations at the strain level. Nevertheless, the disadvantageous cost-benefit ratio (the amount of details disclosed by NGS against the time-expensive and expertise-demanding data assembly process) still precludes the application of these techniques to the routinely assignment of yeast strains, making the selection of the most reliable molecular markers greatly desirable. In this work we propose an original computational approach to discover genes that can be used as a descriptor of the population structure. We found 13 genes whose variability can be used to recapitulate the phylogeny obtained from genome-wide sequences. The same approach that we prove to be successful in yeasts can be generalized to any other population of individuals given the availability of high-quality genomic sequences and of a clear population structure to be targeted.

Show MeSH
Related in: MedlinePlus