Limits...
A systematic computational analysis of biosynthetic gene cluster evolution: lessons for engineering biosynthesis.

Medema MH, Cimermancic P, Sali A, Takano E, Fischbach MA - PLoS Comput. Biol. (2014)

Bottom Line: Attempts to engineer their biosynthetic gene clusters (BGCs) to produce unnatural metabolites with improved properties are often frustrated by the unpredictability and complexity of the enzymes that synthesize these molecules, suggesting that genetic changes within BGCs are limited by specific constraints.Here, by performing a systematic computational analysis of BGC evolution, we derive evidence for three findings that shed light on the ways in which, despite these constraints, nature successfully invents new molecules: 1) BGCs for complex molecules often evolve through the successive merger of smaller sub-clusters, which function as independent evolutionary entities. 2) An important subset of polyketide synthases and nonribosomal peptide synthetases evolve by concerted evolution, which generates sets of sequence-homogenized domains that may hold promise for engineering efforts since they exhibit a high degree of functional interoperability, 3) Individual BGC families evolve in distinct ways, suggesting that design strategies should take into account family-specific functional constraints.These findings suggest novel strategies for using synthetic biology to rationally engineer biosynthetic pathways.

View Article: PubMed Central - PubMed

Affiliation: Department of Microbial Physiology, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands; Groningen Bioinformatics Centre, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands.

ABSTRACT
Bacterial secondary metabolites are widely used as antibiotics, anticancer drugs, insecticides and food additives. Attempts to engineer their biosynthetic gene clusters (BGCs) to produce unnatural metabolites with improved properties are often frustrated by the unpredictability and complexity of the enzymes that synthesize these molecules, suggesting that genetic changes within BGCs are limited by specific constraints. Here, by performing a systematic computational analysis of BGC evolution, we derive evidence for three findings that shed light on the ways in which, despite these constraints, nature successfully invents new molecules: 1) BGCs for complex molecules often evolve through the successive merger of smaller sub-clusters, which function as independent evolutionary entities. 2) An important subset of polyketide synthases and nonribosomal peptide synthetases evolve by concerted evolution, which generates sets of sequence-homogenized domains that may hold promise for engineering efforts since they exhibit a high degree of functional interoperability, 3) Individual BGC families evolve in distinct ways, suggesting that design strategies should take into account family-specific functional constraints. These findings suggest novel strategies for using synthetic biology to rationally engineer biosynthetic pathways.

Show MeSH
The rapid and dynamic evolution of BGCs differs from the evolution of ribosomal gene clusters and primary metabolism.a, Distributions of the best matching sequence homologs with respect to organism similarity (based on 16S rRNA) for predicted BGCs and histidine operons suggest significant differences in the ways they evolve. b, Number of detected rearrangements, indels and duplications plotted against the average percent identity in the aligned gene cluster pairs from which the events were deduced for predicted BGCs (top) and ribosomal gene clusters (bottom). Ribosomal gene clusters were selected for comparison based on their relatively large sizes (∼10–15 kb) compared to primary metabolic operons; to obtain a fair comparison with BGCs, only gene clusters of sizes 5–15 kb were taken into account. Counts are based on a systematic comparison of all gene clusters in our data set that share regions of >1000 bp with >70% identity, in which events were inferred from alignments of such 1000 bp blocks. Of the 10,096 BGC pairs meeting these criteria, 1,750 had a rearrangement, 1,140 had an indel, and 135 had a duplication, each of which were far more common than the corresponding evolutionary events in gene clusters encoding the translation apparatus. Interestingly, while indels and rearrangements could be detected in ∼16% and ∼19% of BGCs of all sizes, duplications are found far more commonly in gene clusters with sizes of >40 kb (7.6%) than in gene clusters with sizes of 10–20 kb (0.3%), suggesting a possible role for duplication and divergence in the evolution of large gene clusters. c, Size distribution of inserted/deleted fragments during recent gene cluster evolution, based on the indel analysis.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4256081&req=5

pcbi-1004016-g001: The rapid and dynamic evolution of BGCs differs from the evolution of ribosomal gene clusters and primary metabolism.a, Distributions of the best matching sequence homologs with respect to organism similarity (based on 16S rRNA) for predicted BGCs and histidine operons suggest significant differences in the ways they evolve. b, Number of detected rearrangements, indels and duplications plotted against the average percent identity in the aligned gene cluster pairs from which the events were deduced for predicted BGCs (top) and ribosomal gene clusters (bottom). Ribosomal gene clusters were selected for comparison based on their relatively large sizes (∼10–15 kb) compared to primary metabolic operons; to obtain a fair comparison with BGCs, only gene clusters of sizes 5–15 kb were taken into account. Counts are based on a systematic comparison of all gene clusters in our data set that share regions of >1000 bp with >70% identity, in which events were inferred from alignments of such 1000 bp blocks. Of the 10,096 BGC pairs meeting these criteria, 1,750 had a rearrangement, 1,140 had an indel, and 135 had a duplication, each of which were far more common than the corresponding evolutionary events in gene clusters encoding the translation apparatus. Interestingly, while indels and rearrangements could be detected in ∼16% and ∼19% of BGCs of all sizes, duplications are found far more commonly in gene clusters with sizes of >40 kb (7.6%) than in gene clusters with sizes of 10–20 kb (0.3%), suggesting a possible role for duplication and divergence in the evolution of large gene clusters. c, Size distribution of inserted/deleted fragments during recent gene cluster evolution, based on the indel analysis.

Mentions: The large diversity of BGCs observed throughout the prokaryotic tree of life [8] suggests that BGCs evolve rapidly. Indeed, when we systematically quantified different evolutionary events by mutually comparing all gene clusters in our data set (Table S1), we found not only that they may have been transferred horizontally at high frequency (Fig. 1a and Figure S1), but also display exceptionally high rates of insertions, deletions, duplications and rearrangements (Fig. 1b). While the percentage of gene cluster pairs related by an indel is independent of gene cluster size, the distribution of indel sizes shows a long tail that includes 195 indels of 10 kb or more (Fig. 1c). As expected, these large indels are more commonly found in larger gene clusters, where they indicate either the merger of one gene cluster fragment with another or the loss of a gene cluster fragment from a larger cluster (see examples in Figure S2). Phylogenetic profiling [16] showed that many such BGC fragments – here termed sub-clusters – appear to evolve in a correlated fashion: 884 different motifs of adjacent Pfam domains (out of 7,641 found) were shown to co-evolve significantly more often than not (P<0.001), based on the χ2 test. These motifs comprise 591 different Pfam domains and have an average length of 5.3 domains (Table S2). As expected, they include many well-known and widely conserved motifs that appear to be linked to specific sub-functionalities of gene clusters, such as precursor biosynthesis, transport or synthesis of a specific chemical moiety, and motifs belonging to modular BGC architectures of NRPSs and PKSs (e.g., C-A-T and KS-AT-T [17]).


A systematic computational analysis of biosynthetic gene cluster evolution: lessons for engineering biosynthesis.

Medema MH, Cimermancic P, Sali A, Takano E, Fischbach MA - PLoS Comput. Biol. (2014)

The rapid and dynamic evolution of BGCs differs from the evolution of ribosomal gene clusters and primary metabolism.a, Distributions of the best matching sequence homologs with respect to organism similarity (based on 16S rRNA) for predicted BGCs and histidine operons suggest significant differences in the ways they evolve. b, Number of detected rearrangements, indels and duplications plotted against the average percent identity in the aligned gene cluster pairs from which the events were deduced for predicted BGCs (top) and ribosomal gene clusters (bottom). Ribosomal gene clusters were selected for comparison based on their relatively large sizes (∼10–15 kb) compared to primary metabolic operons; to obtain a fair comparison with BGCs, only gene clusters of sizes 5–15 kb were taken into account. Counts are based on a systematic comparison of all gene clusters in our data set that share regions of >1000 bp with >70% identity, in which events were inferred from alignments of such 1000 bp blocks. Of the 10,096 BGC pairs meeting these criteria, 1,750 had a rearrangement, 1,140 had an indel, and 135 had a duplication, each of which were far more common than the corresponding evolutionary events in gene clusters encoding the translation apparatus. Interestingly, while indels and rearrangements could be detected in ∼16% and ∼19% of BGCs of all sizes, duplications are found far more commonly in gene clusters with sizes of >40 kb (7.6%) than in gene clusters with sizes of 10–20 kb (0.3%), suggesting a possible role for duplication and divergence in the evolution of large gene clusters. c, Size distribution of inserted/deleted fragments during recent gene cluster evolution, based on the indel analysis.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4256081&req=5

pcbi-1004016-g001: The rapid and dynamic evolution of BGCs differs from the evolution of ribosomal gene clusters and primary metabolism.a, Distributions of the best matching sequence homologs with respect to organism similarity (based on 16S rRNA) for predicted BGCs and histidine operons suggest significant differences in the ways they evolve. b, Number of detected rearrangements, indels and duplications plotted against the average percent identity in the aligned gene cluster pairs from which the events were deduced for predicted BGCs (top) and ribosomal gene clusters (bottom). Ribosomal gene clusters were selected for comparison based on their relatively large sizes (∼10–15 kb) compared to primary metabolic operons; to obtain a fair comparison with BGCs, only gene clusters of sizes 5–15 kb were taken into account. Counts are based on a systematic comparison of all gene clusters in our data set that share regions of >1000 bp with >70% identity, in which events were inferred from alignments of such 1000 bp blocks. Of the 10,096 BGC pairs meeting these criteria, 1,750 had a rearrangement, 1,140 had an indel, and 135 had a duplication, each of which were far more common than the corresponding evolutionary events in gene clusters encoding the translation apparatus. Interestingly, while indels and rearrangements could be detected in ∼16% and ∼19% of BGCs of all sizes, duplications are found far more commonly in gene clusters with sizes of >40 kb (7.6%) than in gene clusters with sizes of 10–20 kb (0.3%), suggesting a possible role for duplication and divergence in the evolution of large gene clusters. c, Size distribution of inserted/deleted fragments during recent gene cluster evolution, based on the indel analysis.
Mentions: The large diversity of BGCs observed throughout the prokaryotic tree of life [8] suggests that BGCs evolve rapidly. Indeed, when we systematically quantified different evolutionary events by mutually comparing all gene clusters in our data set (Table S1), we found not only that they may have been transferred horizontally at high frequency (Fig. 1a and Figure S1), but also display exceptionally high rates of insertions, deletions, duplications and rearrangements (Fig. 1b). While the percentage of gene cluster pairs related by an indel is independent of gene cluster size, the distribution of indel sizes shows a long tail that includes 195 indels of 10 kb or more (Fig. 1c). As expected, these large indels are more commonly found in larger gene clusters, where they indicate either the merger of one gene cluster fragment with another or the loss of a gene cluster fragment from a larger cluster (see examples in Figure S2). Phylogenetic profiling [16] showed that many such BGC fragments – here termed sub-clusters – appear to evolve in a correlated fashion: 884 different motifs of adjacent Pfam domains (out of 7,641 found) were shown to co-evolve significantly more often than not (P<0.001), based on the χ2 test. These motifs comprise 591 different Pfam domains and have an average length of 5.3 domains (Table S2). As expected, they include many well-known and widely conserved motifs that appear to be linked to specific sub-functionalities of gene clusters, such as precursor biosynthesis, transport or synthesis of a specific chemical moiety, and motifs belonging to modular BGC architectures of NRPSs and PKSs (e.g., C-A-T and KS-AT-T [17]).

Bottom Line: Attempts to engineer their biosynthetic gene clusters (BGCs) to produce unnatural metabolites with improved properties are often frustrated by the unpredictability and complexity of the enzymes that synthesize these molecules, suggesting that genetic changes within BGCs are limited by specific constraints.Here, by performing a systematic computational analysis of BGC evolution, we derive evidence for three findings that shed light on the ways in which, despite these constraints, nature successfully invents new molecules: 1) BGCs for complex molecules often evolve through the successive merger of smaller sub-clusters, which function as independent evolutionary entities. 2) An important subset of polyketide synthases and nonribosomal peptide synthetases evolve by concerted evolution, which generates sets of sequence-homogenized domains that may hold promise for engineering efforts since they exhibit a high degree of functional interoperability, 3) Individual BGC families evolve in distinct ways, suggesting that design strategies should take into account family-specific functional constraints.These findings suggest novel strategies for using synthetic biology to rationally engineer biosynthetic pathways.

View Article: PubMed Central - PubMed

Affiliation: Department of Microbial Physiology, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands; Groningen Bioinformatics Centre, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands.

ABSTRACT
Bacterial secondary metabolites are widely used as antibiotics, anticancer drugs, insecticides and food additives. Attempts to engineer their biosynthetic gene clusters (BGCs) to produce unnatural metabolites with improved properties are often frustrated by the unpredictability and complexity of the enzymes that synthesize these molecules, suggesting that genetic changes within BGCs are limited by specific constraints. Here, by performing a systematic computational analysis of BGC evolution, we derive evidence for three findings that shed light on the ways in which, despite these constraints, nature successfully invents new molecules: 1) BGCs for complex molecules often evolve through the successive merger of smaller sub-clusters, which function as independent evolutionary entities. 2) An important subset of polyketide synthases and nonribosomal peptide synthetases evolve by concerted evolution, which generates sets of sequence-homogenized domains that may hold promise for engineering efforts since they exhibit a high degree of functional interoperability, 3) Individual BGC families evolve in distinct ways, suggesting that design strategies should take into account family-specific functional constraints. These findings suggest novel strategies for using synthetic biology to rationally engineer biosynthetic pathways.

Show MeSH