Limits...
A computational pipeline to discover highly phylogenetically informative genes in sequenced genomes: application to Saccharomyces cerevisiae natural strains.

Ramazzotti M, Berná L, Stefanini I, Cavalieri D - Nucleic Acids Res. (2012)

Bottom Line: Nevertheless, the disadvantageous cost-benefit ratio (the amount of details disclosed by NGS against the time-expensive and expertise-demanding data assembly process) still precludes the application of these techniques to the routinely assignment of yeast strains, making the selection of the most reliable molecular markers greatly desirable.We found 13 genes whose variability can be used to recapitulate the phylogeny obtained from genome-wide sequences.The same approach that we prove to be successful in yeasts can be generalized to any other population of individuals given the availability of high-quality genomic sequences and of a clear population structure to be targeted.

View Article: PubMed Central - PubMed

Affiliation: Department of Preclinical and Clinical Pharmacology, University of Florence, Viale G. Pieraccini 6, 50139 Firenze, Italy.

ABSTRACT
The quest for genes representing genetic relationships of strains or individuals within populations and their evolutionary history is acquiring a novel dimension of complexity with the advancement of next-generation sequencing (NGS) technologies. In fact, sequencing an entire genome uncovers genetic variation in coding and non-coding regions and offers the possibility of studying Saccharomyces cerevisiae populations at the strain level. Nevertheless, the disadvantageous cost-benefit ratio (the amount of details disclosed by NGS against the time-expensive and expertise-demanding data assembly process) still precludes the application of these techniques to the routinely assignment of yeast strains, making the selection of the most reliable molecular markers greatly desirable. In this work we propose an original computational approach to discover genes that can be used as a descriptor of the population structure. We found 13 genes whose variability can be used to recapitulate the phylogeny obtained from genome-wide sequences. The same approach that we prove to be successful in yeasts can be generalized to any other population of individuals given the availability of high-quality genomic sequences and of a clear population structure to be targeted.

Show MeSH

Related in: MedlinePlus

Reproduction of the genome-wide phylogenetic tree with our analysis pipeline and using only SNPs/indels in coding sequences. Colors and legends reflect the criteria used in Liti et al. (12) to allow a direct comparison. (A) Tree reproduced using all genes shared by all strains. (B) Tree obtained with the gene YJL099W. (C) Tree obtained with the gene YPR152C. (D) Tree obtained with the gene YJL057C. (E) Tree obtained with the gene YNL161W (branches have been scaled in cladogram mode to appreciate strain resolution).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3351171&req=5

gks005-F2: Reproduction of the genome-wide phylogenetic tree with our analysis pipeline and using only SNPs/indels in coding sequences. Colors and legends reflect the criteria used in Liti et al. (12) to allow a direct comparison. (A) Tree reproduced using all genes shared by all strains. (B) Tree obtained with the gene YJL099W. (C) Tree obtained with the gene YPR152C. (D) Tree obtained with the gene YJL057C. (E) Tree obtained with the gene YNL161W (branches have been scaled in cladogram mode to appreciate strain resolution).

Mentions: We used all the genes to reproduce the phylogenies proposed by Liti and coworkers and to verify if our procedure using coding sequences could generate the same results. To this aim, the 5850 genes were used to generate a neighbor-joining tree based on pairwise SNPs/indels distances. In this way a total of 226 961 SNPs/indels were identified and used. As expected, the phylogenetic tree obtained was superimposable on that of Liti (Figure 2A), confirming that the procedure we developed was consistent and reliable.Figure 2.


A computational pipeline to discover highly phylogenetically informative genes in sequenced genomes: application to Saccharomyces cerevisiae natural strains.

Ramazzotti M, Berná L, Stefanini I, Cavalieri D - Nucleic Acids Res. (2012)

Reproduction of the genome-wide phylogenetic tree with our analysis pipeline and using only SNPs/indels in coding sequences. Colors and legends reflect the criteria used in Liti et al. (12) to allow a direct comparison. (A) Tree reproduced using all genes shared by all strains. (B) Tree obtained with the gene YJL099W. (C) Tree obtained with the gene YPR152C. (D) Tree obtained with the gene YJL057C. (E) Tree obtained with the gene YNL161W (branches have been scaled in cladogram mode to appreciate strain resolution).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3351171&req=5

gks005-F2: Reproduction of the genome-wide phylogenetic tree with our analysis pipeline and using only SNPs/indels in coding sequences. Colors and legends reflect the criteria used in Liti et al. (12) to allow a direct comparison. (A) Tree reproduced using all genes shared by all strains. (B) Tree obtained with the gene YJL099W. (C) Tree obtained with the gene YPR152C. (D) Tree obtained with the gene YJL057C. (E) Tree obtained with the gene YNL161W (branches have been scaled in cladogram mode to appreciate strain resolution).
Mentions: We used all the genes to reproduce the phylogenies proposed by Liti and coworkers and to verify if our procedure using coding sequences could generate the same results. To this aim, the 5850 genes were used to generate a neighbor-joining tree based on pairwise SNPs/indels distances. In this way a total of 226 961 SNPs/indels were identified and used. As expected, the phylogenetic tree obtained was superimposable on that of Liti (Figure 2A), confirming that the procedure we developed was consistent and reliable.Figure 2.

Bottom Line: Nevertheless, the disadvantageous cost-benefit ratio (the amount of details disclosed by NGS against the time-expensive and expertise-demanding data assembly process) still precludes the application of these techniques to the routinely assignment of yeast strains, making the selection of the most reliable molecular markers greatly desirable.We found 13 genes whose variability can be used to recapitulate the phylogeny obtained from genome-wide sequences.The same approach that we prove to be successful in yeasts can be generalized to any other population of individuals given the availability of high-quality genomic sequences and of a clear population structure to be targeted.

View Article: PubMed Central - PubMed

Affiliation: Department of Preclinical and Clinical Pharmacology, University of Florence, Viale G. Pieraccini 6, 50139 Firenze, Italy.

ABSTRACT
The quest for genes representing genetic relationships of strains or individuals within populations and their evolutionary history is acquiring a novel dimension of complexity with the advancement of next-generation sequencing (NGS) technologies. In fact, sequencing an entire genome uncovers genetic variation in coding and non-coding regions and offers the possibility of studying Saccharomyces cerevisiae populations at the strain level. Nevertheless, the disadvantageous cost-benefit ratio (the amount of details disclosed by NGS against the time-expensive and expertise-demanding data assembly process) still precludes the application of these techniques to the routinely assignment of yeast strains, making the selection of the most reliable molecular markers greatly desirable. In this work we propose an original computational approach to discover genes that can be used as a descriptor of the population structure. We found 13 genes whose variability can be used to recapitulate the phylogeny obtained from genome-wide sequences. The same approach that we prove to be successful in yeasts can be generalized to any other population of individuals given the availability of high-quality genomic sequences and of a clear population structure to be targeted.

Show MeSH
Related in: MedlinePlus