Limits...
A systematic computational analysis of biosynthetic gene cluster evolution: lessons for engineering biosynthesis.

Medema MH, Cimermancic P, Sali A, Takano E, Fischbach MA - PLoS Comput. Biol. (2014)

Bottom Line: Attempts to engineer their biosynthetic gene clusters (BGCs) to produce unnatural metabolites with improved properties are often frustrated by the unpredictability and complexity of the enzymes that synthesize these molecules, suggesting that genetic changes within BGCs are limited by specific constraints.Here, by performing a systematic computational analysis of BGC evolution, we derive evidence for three findings that shed light on the ways in which, despite these constraints, nature successfully invents new molecules: 1) BGCs for complex molecules often evolve through the successive merger of smaller sub-clusters, which function as independent evolutionary entities. 2) An important subset of polyketide synthases and nonribosomal peptide synthetases evolve by concerted evolution, which generates sets of sequence-homogenized domains that may hold promise for engineering efforts since they exhibit a high degree of functional interoperability, 3) Individual BGC families evolve in distinct ways, suggesting that design strategies should take into account family-specific functional constraints.These findings suggest novel strategies for using synthetic biology to rationally engineer biosynthetic pathways.

View Article: PubMed Central - PubMed

Affiliation: Department of Microbial Physiology, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands; Groningen Bioinformatics Centre, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands.

ABSTRACT
Bacterial secondary metabolites are widely used as antibiotics, anticancer drugs, insecticides and food additives. Attempts to engineer their biosynthetic gene clusters (BGCs) to produce unnatural metabolites with improved properties are often frustrated by the unpredictability and complexity of the enzymes that synthesize these molecules, suggesting that genetic changes within BGCs are limited by specific constraints. Here, by performing a systematic computational analysis of BGC evolution, we derive evidence for three findings that shed light on the ways in which, despite these constraints, nature successfully invents new molecules: 1) BGCs for complex molecules often evolve through the successive merger of smaller sub-clusters, which function as independent evolutionary entities. 2) An important subset of polyketide synthases and nonribosomal peptide synthetases evolve by concerted evolution, which generates sets of sequence-homogenized domains that may hold promise for engineering efforts since they exhibit a high degree of functional interoperability, 3) Individual BGC families evolve in distinct ways, suggesting that design strategies should take into account family-specific functional constraints. These findings suggest novel strategies for using synthetic biology to rationally engineer biosynthetic pathways.

Show MeSH
Diverse and distinct modes of evolution for PKS and NRPS BGCs.a, Scatter plot showing the first two principal components resulting from a PCA analysis of different evolutionary characteristics of BGCs encoding different classes of NRPs and PKs. The first two principal components describe 63% of the variance. BGCs encoding members of the same family (e.g., lipopeptides, glycopeptides or macrolides) tend to cluster together, suggesting that their family members evolve in similar ways, while different families cluster apart from each other, suggesting distinct modes of evolution. Colors indicate distinct classes of BGCs. b, Scatter plot showing two features of BGCs – internal similarity index and vertical evolution index – that, of the 25 measured features, underlie most of the variation. The internal similarity index indicates how similar domains in a BGC are to other domains within the same BGC. The vertical evolution index indicates how closely related a BGC is to the BGCs harboring the closest relatives of its constituent domains (see Methods for more details). Colors indicate distinct classes of BGCs, as in panel a. c–f, Domain architecture plots of PKSs and NRPSs show distinct modes of evolution: c, Internal duplication with concerted evolution; d, N-terminal additions by module duplication and recombination; e, domain swapping with other BGCs; and f, mixed evolution. Geometric shapes indicate domain types (see legend); domain colors indicate the internal homology p-value of each domain to its closest relative within the same gene cluster, within the total distribution of all similarities between domains of the same type in the entire data set: hence, domains colored red are most similar, while domains colored blue are most dissimilar.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4256081&req=5

pcbi-1004016-g005: Diverse and distinct modes of evolution for PKS and NRPS BGCs.a, Scatter plot showing the first two principal components resulting from a PCA analysis of different evolutionary characteristics of BGCs encoding different classes of NRPs and PKs. The first two principal components describe 63% of the variance. BGCs encoding members of the same family (e.g., lipopeptides, glycopeptides or macrolides) tend to cluster together, suggesting that their family members evolve in similar ways, while different families cluster apart from each other, suggesting distinct modes of evolution. Colors indicate distinct classes of BGCs. b, Scatter plot showing two features of BGCs – internal similarity index and vertical evolution index – that, of the 25 measured features, underlie most of the variation. The internal similarity index indicates how similar domains in a BGC are to other domains within the same BGC. The vertical evolution index indicates how closely related a BGC is to the BGCs harboring the closest relatives of its constituent domains (see Methods for more details). Colors indicate distinct classes of BGCs, as in panel a. c–f, Domain architecture plots of PKSs and NRPSs show distinct modes of evolution: c, Internal duplication with concerted evolution; d, N-terminal additions by module duplication and recombination; e, domain swapping with other BGCs; and f, mixed evolution. Geometric shapes indicate domain types (see legend); domain colors indicate the internal homology p-value of each domain to its closest relative within the same gene cluster, within the total distribution of all similarities between domains of the same type in the entire data set: hence, domains colored red are most similar, while domains colored blue are most dissimilar.

Mentions: To understand more generally how PKS and NRPS BGCs evolve, we set out to measure the contributions of concerted evolution, duplication, and divergence to the evolution of all multimodular PKS and NRPS BGCs in both our known and predicted BGC data sets. We first collected and quantified 25 different features describing the nature of gene cluster sequences and the relationships among their constituent domains (see methods for details). A principal component analysis (PCA) and hierarchical clustering using these features can distinguish many of the well-known gene cluster families from our data set of known BGCs (Figure S9, Fig. 5a). Two features in particular, the ‘internal similarity index’ and the ‘vertical evolution index’, explain much of the variation in terms of the modes of evolution of different classes of gene clusters (Fig. 5b). At the level of individual domains, we find that there are four primary mechanisms by which NRPS and PKS BGCs evolve (Fig. 5c–f, Figure S10). Firstly, gene clusters encoding glycopeptides, calcium-dependent lipopeptides and macrolides/polyethers appear to be most repetitive, pointing to a history of module duplications and/or a prominent influence of concerted evolution. The syringopeptin NRPS [48] and mycolactone PKS [49] are extreme examples of this: both are likely to have evolved recently by subsequent module duplications and concerted evolution. Secondly, we sometimes observed gradients of the internal homology p-values from the N- to C-termini of large synthases, suggesting that some gene clusters evolve to encode the synthesis of larger molecules by iterative duplication of their most N-terminal module, would have the effect of extending an intermediate NRP or PK by the addition of a new starter unit. Thirdly, a group of BGCs including the ones that encode the polyketides psymberin [42] and erythrochelin [50] show a ‘vertical’ type of evolution, in which the domains appear to evolve independently, with perhaps occasional domain swapping with related gene clusters, as has been suggested previously [40]. Finally, there are many gene clusters showing a ‘mixed’ mode of evolution, in which one or more of the above mechanisms are combined. For example, NRP siderophore gene clusters show some signs of internal recombinations, but at the same time many domains show no high mutual similarity. Like the trans-AT PKS gene clusters, they seem to have a higher tendency to recruit domains from dissimilar gene clusters. This recruitment over larger evolutionary distances appears to be a general feature of NRPS gene clusters as opposed to PKS gene clusters, and might be related to the wider range of possible substrates for NRPSs, which often require BGC-specific sub-pathways for the synthesis of a dedicated monomer [51].


A systematic computational analysis of biosynthetic gene cluster evolution: lessons for engineering biosynthesis.

Medema MH, Cimermancic P, Sali A, Takano E, Fischbach MA - PLoS Comput. Biol. (2014)

Diverse and distinct modes of evolution for PKS and NRPS BGCs.a, Scatter plot showing the first two principal components resulting from a PCA analysis of different evolutionary characteristics of BGCs encoding different classes of NRPs and PKs. The first two principal components describe 63% of the variance. BGCs encoding members of the same family (e.g., lipopeptides, glycopeptides or macrolides) tend to cluster together, suggesting that their family members evolve in similar ways, while different families cluster apart from each other, suggesting distinct modes of evolution. Colors indicate distinct classes of BGCs. b, Scatter plot showing two features of BGCs – internal similarity index and vertical evolution index – that, of the 25 measured features, underlie most of the variation. The internal similarity index indicates how similar domains in a BGC are to other domains within the same BGC. The vertical evolution index indicates how closely related a BGC is to the BGCs harboring the closest relatives of its constituent domains (see Methods for more details). Colors indicate distinct classes of BGCs, as in panel a. c–f, Domain architecture plots of PKSs and NRPSs show distinct modes of evolution: c, Internal duplication with concerted evolution; d, N-terminal additions by module duplication and recombination; e, domain swapping with other BGCs; and f, mixed evolution. Geometric shapes indicate domain types (see legend); domain colors indicate the internal homology p-value of each domain to its closest relative within the same gene cluster, within the total distribution of all similarities between domains of the same type in the entire data set: hence, domains colored red are most similar, while domains colored blue are most dissimilar.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4256081&req=5

pcbi-1004016-g005: Diverse and distinct modes of evolution for PKS and NRPS BGCs.a, Scatter plot showing the first two principal components resulting from a PCA analysis of different evolutionary characteristics of BGCs encoding different classes of NRPs and PKs. The first two principal components describe 63% of the variance. BGCs encoding members of the same family (e.g., lipopeptides, glycopeptides or macrolides) tend to cluster together, suggesting that their family members evolve in similar ways, while different families cluster apart from each other, suggesting distinct modes of evolution. Colors indicate distinct classes of BGCs. b, Scatter plot showing two features of BGCs – internal similarity index and vertical evolution index – that, of the 25 measured features, underlie most of the variation. The internal similarity index indicates how similar domains in a BGC are to other domains within the same BGC. The vertical evolution index indicates how closely related a BGC is to the BGCs harboring the closest relatives of its constituent domains (see Methods for more details). Colors indicate distinct classes of BGCs, as in panel a. c–f, Domain architecture plots of PKSs and NRPSs show distinct modes of evolution: c, Internal duplication with concerted evolution; d, N-terminal additions by module duplication and recombination; e, domain swapping with other BGCs; and f, mixed evolution. Geometric shapes indicate domain types (see legend); domain colors indicate the internal homology p-value of each domain to its closest relative within the same gene cluster, within the total distribution of all similarities between domains of the same type in the entire data set: hence, domains colored red are most similar, while domains colored blue are most dissimilar.
Mentions: To understand more generally how PKS and NRPS BGCs evolve, we set out to measure the contributions of concerted evolution, duplication, and divergence to the evolution of all multimodular PKS and NRPS BGCs in both our known and predicted BGC data sets. We first collected and quantified 25 different features describing the nature of gene cluster sequences and the relationships among their constituent domains (see methods for details). A principal component analysis (PCA) and hierarchical clustering using these features can distinguish many of the well-known gene cluster families from our data set of known BGCs (Figure S9, Fig. 5a). Two features in particular, the ‘internal similarity index’ and the ‘vertical evolution index’, explain much of the variation in terms of the modes of evolution of different classes of gene clusters (Fig. 5b). At the level of individual domains, we find that there are four primary mechanisms by which NRPS and PKS BGCs evolve (Fig. 5c–f, Figure S10). Firstly, gene clusters encoding glycopeptides, calcium-dependent lipopeptides and macrolides/polyethers appear to be most repetitive, pointing to a history of module duplications and/or a prominent influence of concerted evolution. The syringopeptin NRPS [48] and mycolactone PKS [49] are extreme examples of this: both are likely to have evolved recently by subsequent module duplications and concerted evolution. Secondly, we sometimes observed gradients of the internal homology p-values from the N- to C-termini of large synthases, suggesting that some gene clusters evolve to encode the synthesis of larger molecules by iterative duplication of their most N-terminal module, would have the effect of extending an intermediate NRP or PK by the addition of a new starter unit. Thirdly, a group of BGCs including the ones that encode the polyketides psymberin [42] and erythrochelin [50] show a ‘vertical’ type of evolution, in which the domains appear to evolve independently, with perhaps occasional domain swapping with related gene clusters, as has been suggested previously [40]. Finally, there are many gene clusters showing a ‘mixed’ mode of evolution, in which one or more of the above mechanisms are combined. For example, NRP siderophore gene clusters show some signs of internal recombinations, but at the same time many domains show no high mutual similarity. Like the trans-AT PKS gene clusters, they seem to have a higher tendency to recruit domains from dissimilar gene clusters. This recruitment over larger evolutionary distances appears to be a general feature of NRPS gene clusters as opposed to PKS gene clusters, and might be related to the wider range of possible substrates for NRPSs, which often require BGC-specific sub-pathways for the synthesis of a dedicated monomer [51].

Bottom Line: Attempts to engineer their biosynthetic gene clusters (BGCs) to produce unnatural metabolites with improved properties are often frustrated by the unpredictability and complexity of the enzymes that synthesize these molecules, suggesting that genetic changes within BGCs are limited by specific constraints.Here, by performing a systematic computational analysis of BGC evolution, we derive evidence for three findings that shed light on the ways in which, despite these constraints, nature successfully invents new molecules: 1) BGCs for complex molecules often evolve through the successive merger of smaller sub-clusters, which function as independent evolutionary entities. 2) An important subset of polyketide synthases and nonribosomal peptide synthetases evolve by concerted evolution, which generates sets of sequence-homogenized domains that may hold promise for engineering efforts since they exhibit a high degree of functional interoperability, 3) Individual BGC families evolve in distinct ways, suggesting that design strategies should take into account family-specific functional constraints.These findings suggest novel strategies for using synthetic biology to rationally engineer biosynthetic pathways.

View Article: PubMed Central - PubMed

Affiliation: Department of Microbial Physiology, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands; Groningen Bioinformatics Centre, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands.

ABSTRACT
Bacterial secondary metabolites are widely used as antibiotics, anticancer drugs, insecticides and food additives. Attempts to engineer their biosynthetic gene clusters (BGCs) to produce unnatural metabolites with improved properties are often frustrated by the unpredictability and complexity of the enzymes that synthesize these molecules, suggesting that genetic changes within BGCs are limited by specific constraints. Here, by performing a systematic computational analysis of BGC evolution, we derive evidence for three findings that shed light on the ways in which, despite these constraints, nature successfully invents new molecules: 1) BGCs for complex molecules often evolve through the successive merger of smaller sub-clusters, which function as independent evolutionary entities. 2) An important subset of polyketide synthases and nonribosomal peptide synthetases evolve by concerted evolution, which generates sets of sequence-homogenized domains that may hold promise for engineering efforts since they exhibit a high degree of functional interoperability, 3) Individual BGC families evolve in distinct ways, suggesting that design strategies should take into account family-specific functional constraints. These findings suggest novel strategies for using synthetic biology to rationally engineer biosynthetic pathways.

Show MeSH