Limits...
Strategy to discover diverse optimal molecules in the small molecule universe.

Rupakheti C, Virshup A, Yang W, Beratan DN - J Chem Inf Model (2015)

Bottom Line: We take a stochastic approach and extend the ACSESS framework ( Virshup et al.Soc. 2013 , 135 , 7296 - 7303 ) to develop diversity oriented molecular libraries that can generate a set of compounds that is representative of the small molecule universe and that also biases the library toward favorable physical property values.We show that the approach is efficient compared to exhaustive enumeration and to existing evolutionary algorithms for generating such libraries by testing in the NKp fitness landscape model and in the fully enumerated GDB-9 chemical universe containing 3 × 10(5) molecules.

View Article: PubMed Central - PubMed

Affiliation: †Program in Computational Biology and Bioinformatics, ‡Department of Chemistry, §Department of Physics, and ∥Department of Biochemistry, Duke University, Durham, North Carolina 27708, United States.

ABSTRACT
The small molecule universe (SMU) is defined as a set of over 10(60) synthetically feasible organic molecules with molecular weight less than ∼500 Da. Exhaustive enumerations and evaluation of all SMU molecules for the purpose of discovering favorable structures is impossible. We take a stochastic approach and extend the ACSESS framework ( Virshup et al. J. Am. Chem. Soc. 2013 , 135 , 7296 - 7303 ) to develop diversity oriented molecular libraries that can generate a set of compounds that is representative of the small molecule universe and that also biases the library toward favorable physical property values. We show that the approach is efficient compared to exhaustive enumeration and to existing evolutionary algorithms for generating such libraries by testing in the NKp fitness landscape model and in the fully enumerated GDB-9 chemical universe containing 3 × 10(5) molecules.

No MeSH data available.


ACSESS and SGAs runsthat maximize the dipole moments (fitness) of diverse molecules. Thetwo plots track the average fitness of libraries generated by eachdesign algorithm (color coded differently), and the error bars representone standard deviation from the mean for multiple runs. (A) comparesthe fitness of the library optimized using ACSESS with three othergenetic algorithms (colored coded differently). (B) compares the diversityof the fit libraries generated by ACSESS and SGAs.
© Copyright Policy - editor-choice
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4372820&req=5

fig6: ACSESS and SGAs runsthat maximize the dipole moments (fitness) of diverse molecules. Thetwo plots track the average fitness of libraries generated by eachdesign algorithm (color coded differently), and the error bars representone standard deviation from the mean for multiple runs. (A) comparesthe fitness of the library optimized using ACSESS with three othergenetic algorithms (colored coded differently). (B) compares the diversityof the fit libraries generated by ACSESS and SGAs.

Mentions: As summarizedin Table 1, we found, on average, that ACSESSgenerate molecules with similar or more favorable dipole moments(fitness) compared to SGAs, but ACSESS generates a more diverse setof fit molecules (dipole moment ≥5.5D) compared to SGAs. Morespecifically, ACSESS generates molecules with higher dipole moments(fitness) than SGA with roulette wheel selection, but ACSESS generatessolutions of similar fitness to SGA with tournament selection (Figure 6A). SGA with elitism (selecting the fittest solutionsin every generation) performs marginally better than ACSESS. However,Figure 6B indicates that the diversity (measuredusing the nearest-neighbor Euclidean distance defined in eq 4) of the molecular library for ACSESS is much larger thanis found with the SGAs. In fact, the nearest-neighbor Euclidean distance(Euclidean distance of ∼10) of the library generated by ACSESSis similar to the nearest-neighbor Euclidean distance (Euclidean distanceof ∼12) that is found for the enumerated GDB-9 molecules withdipole moments ≥5.5D. These results indicate that the diversityof the ACSESS generated library is similar to the diversity of theenumerated GBD-9 universe that contains only the compounds above afitness cutoff. These findings are similar to those from the modelNKp landscape, where ACSESS generated multiple global optima withoutbecoming trapped in local optima. In contrast, the SGAs became trappedin local optima in 40% of the runs (Figure 4B) and produced far fewer fittest solutions (Figure 4C). These calculations indicate that the diversity enforcementin ACSESS yields sampling of different high fitness regions of theGDB-9 space favorably compared to SGAs. It is important to note that,while ACSESS generates large diversity solutions, the fitness is stillcomparable to and better, in some cases, compared to that found withpopular simple genetic algorithms (Figures 4D and 6A).


Strategy to discover diverse optimal molecules in the small molecule universe.

Rupakheti C, Virshup A, Yang W, Beratan DN - J Chem Inf Model (2015)

ACSESS and SGAs runsthat maximize the dipole moments (fitness) of diverse molecules. Thetwo plots track the average fitness of libraries generated by eachdesign algorithm (color coded differently), and the error bars representone standard deviation from the mean for multiple runs. (A) comparesthe fitness of the library optimized using ACSESS with three othergenetic algorithms (colored coded differently). (B) compares the diversityof the fit libraries generated by ACSESS and SGAs.
© Copyright Policy - editor-choice
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4372820&req=5

fig6: ACSESS and SGAs runsthat maximize the dipole moments (fitness) of diverse molecules. Thetwo plots track the average fitness of libraries generated by eachdesign algorithm (color coded differently), and the error bars representone standard deviation from the mean for multiple runs. (A) comparesthe fitness of the library optimized using ACSESS with three othergenetic algorithms (colored coded differently). (B) compares the diversityof the fit libraries generated by ACSESS and SGAs.
Mentions: As summarizedin Table 1, we found, on average, that ACSESSgenerate molecules with similar or more favorable dipole moments(fitness) compared to SGAs, but ACSESS generates a more diverse setof fit molecules (dipole moment ≥5.5D) compared to SGAs. Morespecifically, ACSESS generates molecules with higher dipole moments(fitness) than SGA with roulette wheel selection, but ACSESS generatessolutions of similar fitness to SGA with tournament selection (Figure 6A). SGA with elitism (selecting the fittest solutionsin every generation) performs marginally better than ACSESS. However,Figure 6B indicates that the diversity (measuredusing the nearest-neighbor Euclidean distance defined in eq 4) of the molecular library for ACSESS is much larger thanis found with the SGAs. In fact, the nearest-neighbor Euclidean distance(Euclidean distance of ∼10) of the library generated by ACSESSis similar to the nearest-neighbor Euclidean distance (Euclidean distanceof ∼12) that is found for the enumerated GDB-9 molecules withdipole moments ≥5.5D. These results indicate that the diversityof the ACSESS generated library is similar to the diversity of theenumerated GBD-9 universe that contains only the compounds above afitness cutoff. These findings are similar to those from the modelNKp landscape, where ACSESS generated multiple global optima withoutbecoming trapped in local optima. In contrast, the SGAs became trappedin local optima in 40% of the runs (Figure 4B) and produced far fewer fittest solutions (Figure 4C). These calculations indicate that the diversity enforcementin ACSESS yields sampling of different high fitness regions of theGDB-9 space favorably compared to SGAs. It is important to note that,while ACSESS generates large diversity solutions, the fitness is stillcomparable to and better, in some cases, compared to that found withpopular simple genetic algorithms (Figures 4D and 6A).

Bottom Line: We take a stochastic approach and extend the ACSESS framework ( Virshup et al.Soc. 2013 , 135 , 7296 - 7303 ) to develop diversity oriented molecular libraries that can generate a set of compounds that is representative of the small molecule universe and that also biases the library toward favorable physical property values.We show that the approach is efficient compared to exhaustive enumeration and to existing evolutionary algorithms for generating such libraries by testing in the NKp fitness landscape model and in the fully enumerated GDB-9 chemical universe containing 3 × 10(5) molecules.

View Article: PubMed Central - PubMed

Affiliation: †Program in Computational Biology and Bioinformatics, ‡Department of Chemistry, §Department of Physics, and ∥Department of Biochemistry, Duke University, Durham, North Carolina 27708, United States.

ABSTRACT
The small molecule universe (SMU) is defined as a set of over 10(60) synthetically feasible organic molecules with molecular weight less than ∼500 Da. Exhaustive enumerations and evaluation of all SMU molecules for the purpose of discovering favorable structures is impossible. We take a stochastic approach and extend the ACSESS framework ( Virshup et al. J. Am. Chem. Soc. 2013 , 135 , 7296 - 7303 ) to develop diversity oriented molecular libraries that can generate a set of compounds that is representative of the small molecule universe and that also biases the library toward favorable physical property values. We show that the approach is efficient compared to exhaustive enumeration and to existing evolutionary algorithms for generating such libraries by testing in the NKp fitness landscape model and in the fully enumerated GDB-9 chemical universe containing 3 × 10(5) molecules.

No MeSH data available.