Limits...
A systematic computational analysis of biosynthetic gene cluster evolution: lessons for engineering biosynthesis.

Medema MH, Cimermancic P, Sali A, Takano E, Fischbach MA - PLoS Comput. Biol. (2014)

Bottom Line: Attempts to engineer their biosynthetic gene clusters (BGCs) to produce unnatural metabolites with improved properties are often frustrated by the unpredictability and complexity of the enzymes that synthesize these molecules, suggesting that genetic changes within BGCs are limited by specific constraints.Here, by performing a systematic computational analysis of BGC evolution, we derive evidence for three findings that shed light on the ways in which, despite these constraints, nature successfully invents new molecules: 1) BGCs for complex molecules often evolve through the successive merger of smaller sub-clusters, which function as independent evolutionary entities. 2) An important subset of polyketide synthases and nonribosomal peptide synthetases evolve by concerted evolution, which generates sets of sequence-homogenized domains that may hold promise for engineering efforts since they exhibit a high degree of functional interoperability, 3) Individual BGC families evolve in distinct ways, suggesting that design strategies should take into account family-specific functional constraints.These findings suggest novel strategies for using synthetic biology to rationally engineer biosynthetic pathways.

View Article: PubMed Central - PubMed

Affiliation: Department of Microbial Physiology, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands; Groningen Bioinformatics Centre, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands.

ABSTRACT
Bacterial secondary metabolites are widely used as antibiotics, anticancer drugs, insecticides and food additives. Attempts to engineer their biosynthetic gene clusters (BGCs) to produce unnatural metabolites with improved properties are often frustrated by the unpredictability and complexity of the enzymes that synthesize these molecules, suggesting that genetic changes within BGCs are limited by specific constraints. Here, by performing a systematic computational analysis of BGC evolution, we derive evidence for three findings that shed light on the ways in which, despite these constraints, nature successfully invents new molecules: 1) BGCs for complex molecules often evolve through the successive merger of smaller sub-clusters, which function as independent evolutionary entities. 2) An important subset of polyketide synthases and nonribosomal peptide synthetases evolve by concerted evolution, which generates sets of sequence-homogenized domains that may hold promise for engineering efforts since they exhibit a high degree of functional interoperability, 3) Individual BGC families evolve in distinct ways, suggesting that design strategies should take into account family-specific functional constraints. These findings suggest novel strategies for using synthetic biology to rationally engineer biosynthetic pathways.

Show MeSH
Unexpected evolutionary relationships within the rapamycin family.a, Distinct scaffolds produced by pathways from related BGCs. The scatter plot shows the relationship between the sequence homology of a pair of BGCs (x-axis) and the structural homology of their small molecule products (y-axis), compared to rapamycin and its BGC. Each circle represents a gene cluster and its small molecule product. Meridamycin and FK520 are closely related to rapamycin, as are their BGCs. While the pladienolide BGC is closely related to the rapamycin BGC, the structure of pladienolide itself is not very similar to that of rapamycin. In particular, pladienolide has a much smaller macrocycle and lacks shikimate- or pipecolate-derived moieties, and, as a result, binds to a distinct protein target. Structural similarity is estimated by the Tanimoto coefficient using linear-path fingerprints (FP2) from Open Babel [67], while sequence homology is represented as the Jaccard index defined on pairs of Pfam domains that share sequence identities within the top 10th percentile of all-pair sequence identities. The number of domain pairs that share sequence identities within the top 10th percentile and sequence identity of all domain pairs are shown as point sizes and colors, respectively. b, The role of concerted evolution in homogenizing domains within a BGC. Phylogenetic trees of KS and AT domains from the rapamycin, FK520, meridamycin, and pladienolide BGCs are shown (for detailed trees with accession numbers and bootstrap values, see Figure S11). The KS and AT sequences largely cluster into BGC-specific clades; for the AT domains, this is even the case for two different clusters encoding the same compound (meridamycin), showing the ability of concerted evolution to homogenize domains within a BGC. c, Chemical structures of rapamycin, meridamycin, FK520 and pladienolide. The sub-structure shared among rapamycin, meridamycin and FK520 is colored red, and the domains responsible for the biosynthesis of this sub-structure in each molecule are indicated with red circles in b.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4256081&req=5

pcbi-1004016-g003: Unexpected evolutionary relationships within the rapamycin family.a, Distinct scaffolds produced by pathways from related BGCs. The scatter plot shows the relationship between the sequence homology of a pair of BGCs (x-axis) and the structural homology of their small molecule products (y-axis), compared to rapamycin and its BGC. Each circle represents a gene cluster and its small molecule product. Meridamycin and FK520 are closely related to rapamycin, as are their BGCs. While the pladienolide BGC is closely related to the rapamycin BGC, the structure of pladienolide itself is not very similar to that of rapamycin. In particular, pladienolide has a much smaller macrocycle and lacks shikimate- or pipecolate-derived moieties, and, as a result, binds to a distinct protein target. Structural similarity is estimated by the Tanimoto coefficient using linear-path fingerprints (FP2) from Open Babel [67], while sequence homology is represented as the Jaccard index defined on pairs of Pfam domains that share sequence identities within the top 10th percentile of all-pair sequence identities. The number of domain pairs that share sequence identities within the top 10th percentile and sequence identity of all domain pairs are shown as point sizes and colors, respectively. b, The role of concerted evolution in homogenizing domains within a BGC. Phylogenetic trees of KS and AT domains from the rapamycin, FK520, meridamycin, and pladienolide BGCs are shown (for detailed trees with accession numbers and bootstrap values, see Figure S11). The KS and AT sequences largely cluster into BGC-specific clades; for the AT domains, this is even the case for two different clusters encoding the same compound (meridamycin), showing the ability of concerted evolution to homogenize domains within a BGC. c, Chemical structures of rapamycin, meridamycin, FK520 and pladienolide. The sub-structure shared among rapamycin, meridamycin and FK520 is colored red, and the domains responsible for the biosynthesis of this sub-structure in each molecule are indicated with red circles in b.

Mentions: Many chemical scaffold types of secondary metabolite classes are quite distinct, which raises the question of how BGC families encoding the synthesis of distinct scaffolds are related. To assess this question, we calculated the proportion and similarity of Pfam domains shared between all pairs of BGCs within our data set of 732 known gene clusters using multiple sequence alignments for each Pfam domain (Fig. 3) and looked specifically for close homologues of BGCs just outside their immediate family. Even though of course sequence similarity alone does not provide conclusive evidence on evolutionary histories, the analysis did suggest that unexpected evolutionary connections might exist between natural products of different scaffold types.


A systematic computational analysis of biosynthetic gene cluster evolution: lessons for engineering biosynthesis.

Medema MH, Cimermancic P, Sali A, Takano E, Fischbach MA - PLoS Comput. Biol. (2014)

Unexpected evolutionary relationships within the rapamycin family.a, Distinct scaffolds produced by pathways from related BGCs. The scatter plot shows the relationship between the sequence homology of a pair of BGCs (x-axis) and the structural homology of their small molecule products (y-axis), compared to rapamycin and its BGC. Each circle represents a gene cluster and its small molecule product. Meridamycin and FK520 are closely related to rapamycin, as are their BGCs. While the pladienolide BGC is closely related to the rapamycin BGC, the structure of pladienolide itself is not very similar to that of rapamycin. In particular, pladienolide has a much smaller macrocycle and lacks shikimate- or pipecolate-derived moieties, and, as a result, binds to a distinct protein target. Structural similarity is estimated by the Tanimoto coefficient using linear-path fingerprints (FP2) from Open Babel [67], while sequence homology is represented as the Jaccard index defined on pairs of Pfam domains that share sequence identities within the top 10th percentile of all-pair sequence identities. The number of domain pairs that share sequence identities within the top 10th percentile and sequence identity of all domain pairs are shown as point sizes and colors, respectively. b, The role of concerted evolution in homogenizing domains within a BGC. Phylogenetic trees of KS and AT domains from the rapamycin, FK520, meridamycin, and pladienolide BGCs are shown (for detailed trees with accession numbers and bootstrap values, see Figure S11). The KS and AT sequences largely cluster into BGC-specific clades; for the AT domains, this is even the case for two different clusters encoding the same compound (meridamycin), showing the ability of concerted evolution to homogenize domains within a BGC. c, Chemical structures of rapamycin, meridamycin, FK520 and pladienolide. The sub-structure shared among rapamycin, meridamycin and FK520 is colored red, and the domains responsible for the biosynthesis of this sub-structure in each molecule are indicated with red circles in b.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4256081&req=5

pcbi-1004016-g003: Unexpected evolutionary relationships within the rapamycin family.a, Distinct scaffolds produced by pathways from related BGCs. The scatter plot shows the relationship between the sequence homology of a pair of BGCs (x-axis) and the structural homology of their small molecule products (y-axis), compared to rapamycin and its BGC. Each circle represents a gene cluster and its small molecule product. Meridamycin and FK520 are closely related to rapamycin, as are their BGCs. While the pladienolide BGC is closely related to the rapamycin BGC, the structure of pladienolide itself is not very similar to that of rapamycin. In particular, pladienolide has a much smaller macrocycle and lacks shikimate- or pipecolate-derived moieties, and, as a result, binds to a distinct protein target. Structural similarity is estimated by the Tanimoto coefficient using linear-path fingerprints (FP2) from Open Babel [67], while sequence homology is represented as the Jaccard index defined on pairs of Pfam domains that share sequence identities within the top 10th percentile of all-pair sequence identities. The number of domain pairs that share sequence identities within the top 10th percentile and sequence identity of all domain pairs are shown as point sizes and colors, respectively. b, The role of concerted evolution in homogenizing domains within a BGC. Phylogenetic trees of KS and AT domains from the rapamycin, FK520, meridamycin, and pladienolide BGCs are shown (for detailed trees with accession numbers and bootstrap values, see Figure S11). The KS and AT sequences largely cluster into BGC-specific clades; for the AT domains, this is even the case for two different clusters encoding the same compound (meridamycin), showing the ability of concerted evolution to homogenize domains within a BGC. c, Chemical structures of rapamycin, meridamycin, FK520 and pladienolide. The sub-structure shared among rapamycin, meridamycin and FK520 is colored red, and the domains responsible for the biosynthesis of this sub-structure in each molecule are indicated with red circles in b.
Mentions: Many chemical scaffold types of secondary metabolite classes are quite distinct, which raises the question of how BGC families encoding the synthesis of distinct scaffolds are related. To assess this question, we calculated the proportion and similarity of Pfam domains shared between all pairs of BGCs within our data set of 732 known gene clusters using multiple sequence alignments for each Pfam domain (Fig. 3) and looked specifically for close homologues of BGCs just outside their immediate family. Even though of course sequence similarity alone does not provide conclusive evidence on evolutionary histories, the analysis did suggest that unexpected evolutionary connections might exist between natural products of different scaffold types.

Bottom Line: Attempts to engineer their biosynthetic gene clusters (BGCs) to produce unnatural metabolites with improved properties are often frustrated by the unpredictability and complexity of the enzymes that synthesize these molecules, suggesting that genetic changes within BGCs are limited by specific constraints.Here, by performing a systematic computational analysis of BGC evolution, we derive evidence for three findings that shed light on the ways in which, despite these constraints, nature successfully invents new molecules: 1) BGCs for complex molecules often evolve through the successive merger of smaller sub-clusters, which function as independent evolutionary entities. 2) An important subset of polyketide synthases and nonribosomal peptide synthetases evolve by concerted evolution, which generates sets of sequence-homogenized domains that may hold promise for engineering efforts since they exhibit a high degree of functional interoperability, 3) Individual BGC families evolve in distinct ways, suggesting that design strategies should take into account family-specific functional constraints.These findings suggest novel strategies for using synthetic biology to rationally engineer biosynthetic pathways.

View Article: PubMed Central - PubMed

Affiliation: Department of Microbial Physiology, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands; Groningen Bioinformatics Centre, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands.

ABSTRACT
Bacterial secondary metabolites are widely used as antibiotics, anticancer drugs, insecticides and food additives. Attempts to engineer their biosynthetic gene clusters (BGCs) to produce unnatural metabolites with improved properties are often frustrated by the unpredictability and complexity of the enzymes that synthesize these molecules, suggesting that genetic changes within BGCs are limited by specific constraints. Here, by performing a systematic computational analysis of BGC evolution, we derive evidence for three findings that shed light on the ways in which, despite these constraints, nature successfully invents new molecules: 1) BGCs for complex molecules often evolve through the successive merger of smaller sub-clusters, which function as independent evolutionary entities. 2) An important subset of polyketide synthases and nonribosomal peptide synthetases evolve by concerted evolution, which generates sets of sequence-homogenized domains that may hold promise for engineering efforts since they exhibit a high degree of functional interoperability, 3) Individual BGC families evolve in distinct ways, suggesting that design strategies should take into account family-specific functional constraints. These findings suggest novel strategies for using synthetic biology to rationally engineer biosynthetic pathways.

Show MeSH