Limits...
Molecular taxonomy of phytopathogenic fungi: a case study in Peronospora.

Göker M, García-Blázquez G, Voglmayr H, Tellería MT, Martín MP - PLoS ONE (2009)

Bottom Line: The method determines the distance function and clustering setting that result in an optimal agreement with selected reference data.Clustering optimization appears to be broadly applicable in automated, sequence-based taxonomy.The method connects traditional and modern taxonomic disciplines by specifically addressing the issue of how to optimally account for both traditional species concepts and genetic divergence.

View Article: PubMed Central - PubMed

Affiliation: Organismic Botany, Eberhard Karls University of Tübingen, Tübingen, Germany. peronospora@goeker.org

ABSTRACT

Background: Inappropriate taxon definitions may have severe consequences in many areas. For instance, biologically sensible species delimitation of plant pathogens is crucial for measures such as plant protection or biological control and for comparative studies involving model organisms. However, delimiting species is challenging in the case of organisms for which often only molecular data are available, such as prokaryotes, fungi, and many unicellular eukaryotes. Even in the case of organisms with well-established morphological characteristics, molecular taxonomy is often necessary to emend current taxonomic concepts and to analyze DNA sequences directly sampled from the environment. Typically, for this purpose clustering approaches to delineate molecular operational taxonomic units have been applied using arbitrary choices regarding the distance threshold values, and the clustering algorithms.

Methodology: Here, we report on a clustering optimization method to establish a molecular taxonomy of Peronospora based on ITS nrDNA sequences. Peronospora is the largest genus within the downy mildews, which are obligate parasites of higher plants, and includes various economically important pathogens. The method determines the distance function and clustering setting that result in an optimal agreement with selected reference data. Optimization was based on both taxonomy-based and host-based reference information, yielding the same outcome. Resampling and permutation methods indicate that the method is robust regarding taxon sampling and errors in the reference data. Tests with newly obtained ITS sequences demonstrate the use of the re-classified dataset in molecular identification of downy mildews.

Conclusions: A corrected taxonomy is provided for all Peronospora ITS sequences contained in public databases. Clustering optimization appears to be broadly applicable in automated, sequence-based taxonomy. The method connects traditional and modern taxonomic disciplines by specifically addressing the issue of how to optimally account for both traditional species concepts and genetic divergence.

Show MeSH
Optimization plots.Modified Rand Index (MRI) plot based on the poa alignment, uncorrected distances, the globally optimal F value (1.0) and two suboptimal F values (0.0 and 0.5). Axes: x-axis, T values examined (values larger than 0.25 gave the same result because all sequences were assigned to a single cluster); y-axis, resulting MRI values for taxonomy-based optimization (thick lines) and host-based optimization (thin lines). Colours: black, F = 1.0; dark grey, F = 0.5; light grey, F = 0.0.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2712678&req=5

pone-0006319-g001: Optimization plots.Modified Rand Index (MRI) plot based on the poa alignment, uncorrected distances, the globally optimal F value (1.0) and two suboptimal F values (0.0 and 0.5). Axes: x-axis, T values examined (values larger than 0.25 gave the same result because all sequences were assigned to a single cluster); y-axis, resulting MRI values for taxonomy-based optimization (thick lines) and host-based optimization (thin lines). Colours: black, F = 1.0; dark grey, F = 0.5; light grey, F = 0.0.

Mentions: The complete sequence dataset downloaded from GenBank included 439 ITS nrDNA sequences, 427 of which were sufficiently long (see above). Within the latter, 354 accessions contained a correctly formatted “organism” entry, and 388 contained a correctly formatted host name. The reference partition constructed from Peronospora/Pseudoperonospora species names comprised 86 distinct entries, the one constructed from the hosts comprised 141 distinct entries (from 72 distinct genera). The poa alignment had a length of 2118 bp, which was partly caused by some sequences comprising parts of the small subunit rDNA and by the long ITS1 insertions in the Trifolium parasites [40], [41]. Taxonomy-based optimization of the P distances inferred from the poa alignment resulted in an optimal modified Rand Index (MRI) value of 0.85485, corresponding to F = 1.0 and T = 0.0075. In host-based optimization, the best MRI value was 0.85204, which was obtained if exactly the same F and T values were used for clustering. The optimization plot for the poa alignment, F = 1.0 and both reference partitions are shown in Fig. 1. Plots for two suboptimal F values, 0.0 and 0.5, are also shown. If applied to the full alignment of 427 sequences, the optimal clustering parameters resulted in 117 clusters (taxonomic units or TU) and 199 distinct combinations of TU and GenBank “organism” entry. The effect of T and F on the resulting number of clusters (TU) is shown in Fig. 2. Twenty distinct “organism” entries appeared in more than one TU, whereas 23 TU where associated with more than one “organism” (supporting file S2). The best MRI values obtained for the poa alignment and all distance formulae, dependent on the tested F values, is shown in supporting file S3. While an additional local maximum is present in the case of taxonomy-based optimization for F = 0.25 and F = 0.30, F = 1.0 gives far superior MRI values than any other F value for both partitions. Using other alignment programs and/or distance formulae did not result in considerably higher MRI values; rather, improvements were restricted to the third position after the decimal point. All alignments, selected distance matrices and the original optimization results for all of them are included in supporting file S1.


Molecular taxonomy of phytopathogenic fungi: a case study in Peronospora.

Göker M, García-Blázquez G, Voglmayr H, Tellería MT, Martín MP - PLoS ONE (2009)

Optimization plots.Modified Rand Index (MRI) plot based on the poa alignment, uncorrected distances, the globally optimal F value (1.0) and two suboptimal F values (0.0 and 0.5). Axes: x-axis, T values examined (values larger than 0.25 gave the same result because all sequences were assigned to a single cluster); y-axis, resulting MRI values for taxonomy-based optimization (thick lines) and host-based optimization (thin lines). Colours: black, F = 1.0; dark grey, F = 0.5; light grey, F = 0.0.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2712678&req=5

pone-0006319-g001: Optimization plots.Modified Rand Index (MRI) plot based on the poa alignment, uncorrected distances, the globally optimal F value (1.0) and two suboptimal F values (0.0 and 0.5). Axes: x-axis, T values examined (values larger than 0.25 gave the same result because all sequences were assigned to a single cluster); y-axis, resulting MRI values for taxonomy-based optimization (thick lines) and host-based optimization (thin lines). Colours: black, F = 1.0; dark grey, F = 0.5; light grey, F = 0.0.
Mentions: The complete sequence dataset downloaded from GenBank included 439 ITS nrDNA sequences, 427 of which were sufficiently long (see above). Within the latter, 354 accessions contained a correctly formatted “organism” entry, and 388 contained a correctly formatted host name. The reference partition constructed from Peronospora/Pseudoperonospora species names comprised 86 distinct entries, the one constructed from the hosts comprised 141 distinct entries (from 72 distinct genera). The poa alignment had a length of 2118 bp, which was partly caused by some sequences comprising parts of the small subunit rDNA and by the long ITS1 insertions in the Trifolium parasites [40], [41]. Taxonomy-based optimization of the P distances inferred from the poa alignment resulted in an optimal modified Rand Index (MRI) value of 0.85485, corresponding to F = 1.0 and T = 0.0075. In host-based optimization, the best MRI value was 0.85204, which was obtained if exactly the same F and T values were used for clustering. The optimization plot for the poa alignment, F = 1.0 and both reference partitions are shown in Fig. 1. Plots for two suboptimal F values, 0.0 and 0.5, are also shown. If applied to the full alignment of 427 sequences, the optimal clustering parameters resulted in 117 clusters (taxonomic units or TU) and 199 distinct combinations of TU and GenBank “organism” entry. The effect of T and F on the resulting number of clusters (TU) is shown in Fig. 2. Twenty distinct “organism” entries appeared in more than one TU, whereas 23 TU where associated with more than one “organism” (supporting file S2). The best MRI values obtained for the poa alignment and all distance formulae, dependent on the tested F values, is shown in supporting file S3. While an additional local maximum is present in the case of taxonomy-based optimization for F = 0.25 and F = 0.30, F = 1.0 gives far superior MRI values than any other F value for both partitions. Using other alignment programs and/or distance formulae did not result in considerably higher MRI values; rather, improvements were restricted to the third position after the decimal point. All alignments, selected distance matrices and the original optimization results for all of them are included in supporting file S1.

Bottom Line: The method determines the distance function and clustering setting that result in an optimal agreement with selected reference data.Clustering optimization appears to be broadly applicable in automated, sequence-based taxonomy.The method connects traditional and modern taxonomic disciplines by specifically addressing the issue of how to optimally account for both traditional species concepts and genetic divergence.

View Article: PubMed Central - PubMed

Affiliation: Organismic Botany, Eberhard Karls University of Tübingen, Tübingen, Germany. peronospora@goeker.org

ABSTRACT

Background: Inappropriate taxon definitions may have severe consequences in many areas. For instance, biologically sensible species delimitation of plant pathogens is crucial for measures such as plant protection or biological control and for comparative studies involving model organisms. However, delimiting species is challenging in the case of organisms for which often only molecular data are available, such as prokaryotes, fungi, and many unicellular eukaryotes. Even in the case of organisms with well-established morphological characteristics, molecular taxonomy is often necessary to emend current taxonomic concepts and to analyze DNA sequences directly sampled from the environment. Typically, for this purpose clustering approaches to delineate molecular operational taxonomic units have been applied using arbitrary choices regarding the distance threshold values, and the clustering algorithms.

Methodology: Here, we report on a clustering optimization method to establish a molecular taxonomy of Peronospora based on ITS nrDNA sequences. Peronospora is the largest genus within the downy mildews, which are obligate parasites of higher plants, and includes various economically important pathogens. The method determines the distance function and clustering setting that result in an optimal agreement with selected reference data. Optimization was based on both taxonomy-based and host-based reference information, yielding the same outcome. Resampling and permutation methods indicate that the method is robust regarding taxon sampling and errors in the reference data. Tests with newly obtained ITS sequences demonstrate the use of the re-classified dataset in molecular identification of downy mildews.

Conclusions: A corrected taxonomy is provided for all Peronospora ITS sequences contained in public databases. Clustering optimization appears to be broadly applicable in automated, sequence-based taxonomy. The method connects traditional and modern taxonomic disciplines by specifically addressing the issue of how to optimally account for both traditional species concepts and genetic divergence.

Show MeSH