Limits...
The environmental fate of organic pollutants through the global microbial metabolism.

Gómez MJ, Pazos F, Guijarro FJ, de Lorenzo V, Valencia A - Mol. Syst. Biol. (2007)

Bottom Line: A machine learning approach has been instrumental to expose a correlation between the frequency of 149 atomic triads (chemotopes) common in organo-chemical compounds and the global capacity of microorganisms to metabolise them.Depending on the type of environmental fate defined, the system can correctly predict the biodegradative outcome for 73-87% of compounds.The application of this predictive tool to chemical species released into the environment provides an early instrument for tentatively classifying the compounds as biodegradable or recalcitrant.

View Article: PubMed Central - PubMed

Affiliation: Centro de Astrobiología (INTA-CSIC), Ctra. Torrejón Ajalvir, Km 4. Torrejón de Ardoz, Madrid, Spain.

ABSTRACT
The production of new chemicals for industrial or therapeutic applications exceeds our ability to generate experimental data on their biological fate once they are released into the environment. Typically, mixtures of organic pollutants are freed into a variety of sites inhabited by diverse microorganisms, which structure complex multispecies metabolic networks. A machine learning approach has been instrumental to expose a correlation between the frequency of 149 atomic triads (chemotopes) common in organo-chemical compounds and the global capacity of microorganisms to metabolise them. Depending on the type of environmental fate defined, the system can correctly predict the biodegradative outcome for 73-87% of compounds. This system is available to the community as a web server (http://www.pdg.cnb.uam.es/BDPSERVER). The application of this predictive tool to chemical species released into the environment provides an early instrument for tentatively classifying the compounds as biodegradable or recalcitrant. Automated surveys of lists of industrial chemicals currently employed in large quantities revealed that herbicides are the group of functional molecules more difficult to recycle into the biosphere through the inclusive microbial metabolism.

Show MeSH
Comparison of the prediction accuracies in cross-validation tests with trained classifiers and random classifiers. Fivefold cross-validation tests were conducted, for each of the considered classification schemes, using both the original classifiers and the equivalent random classifiers, which assign compounds arbitrarily to classes with a probability that is proportional to the size of the classes (Table II for the statistical significance of these differences between the predictors and their random counterparts). The averaged accuracy of the five iterations of the cross-validation experiment and the corresponding standard deviation are represented. Note that the dataset was extracted from UMBBD, which is overrepresented with biodegradable compounds. This makes the accuracy of the predictive scheme reflected in the figure (as well as the related specificities and sensitivities; Supplementary Figure S3) to be an underestimation of the actual prognostic power of the system for new chemicals.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1911198&req=5

f4: Comparison of the prediction accuracies in cross-validation tests with trained classifiers and random classifiers. Fivefold cross-validation tests were conducted, for each of the considered classification schemes, using both the original classifiers and the equivalent random classifiers, which assign compounds arbitrarily to classes with a probability that is proportional to the size of the classes (Table II for the statistical significance of these differences between the predictors and their random counterparts). The averaged accuracy of the five iterations of the cross-validation experiment and the corresponding standard deviation are represented. Note that the dataset was extracted from UMBBD, which is overrepresented with biodegradable compounds. This makes the accuracy of the predictive scheme reflected in the figure (as well as the related specificities and sensitivities; Supplementary Figure S3) to be an underestimation of the actual prognostic power of the system for new chemicals.

Mentions: To authenticate the significance of the figures generated above and uncover possible biases of the dataset on the predictive value of the classifiers, we compared the performance of our c4.5-based system with one employing random predictors. Possible biases due to overrepresentation of classes were corrected by having such predictors assigning compounds randomly to one of the two fates for each classification scheme, with a probability that was proportional to the population of each of the classes. Figure 4 shows the average accuracy obtained for the different classes (versus their negated classes) for the real and the randomised dataset. It can be seen that the real dataset produces a considerably higher accuracy for all the classes, which is more pronounced in the CM and NB groups. Details of the analyses of the randomised dataset (sensitivity or specificity) are shown in the Supplementary Figure S3. To assess the statistical significance of the differences between the c4.5-based system and the random predictors, the dataset was subject to a sign test. This analysis compares the performance of two methods based on the number of cases that one of them provides a correct response, whereas the other fails and vice versa. A P(N) is thereby obtained which can be interpreted as the probability for the –hypothesis, that is that the observed differences are happening by chance. These probabilities are shown in Table II. The result clearly demonstrates the superiority of the c4.5-based predictors as compared to their equivalent random counterparts with values of P(N) in the order 10−16–10−39.


The environmental fate of organic pollutants through the global microbial metabolism.

Gómez MJ, Pazos F, Guijarro FJ, de Lorenzo V, Valencia A - Mol. Syst. Biol. (2007)

Comparison of the prediction accuracies in cross-validation tests with trained classifiers and random classifiers. Fivefold cross-validation tests were conducted, for each of the considered classification schemes, using both the original classifiers and the equivalent random classifiers, which assign compounds arbitrarily to classes with a probability that is proportional to the size of the classes (Table II for the statistical significance of these differences between the predictors and their random counterparts). The averaged accuracy of the five iterations of the cross-validation experiment and the corresponding standard deviation are represented. Note that the dataset was extracted from UMBBD, which is overrepresented with biodegradable compounds. This makes the accuracy of the predictive scheme reflected in the figure (as well as the related specificities and sensitivities; Supplementary Figure S3) to be an underestimation of the actual prognostic power of the system for new chemicals.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1911198&req=5

f4: Comparison of the prediction accuracies in cross-validation tests with trained classifiers and random classifiers. Fivefold cross-validation tests were conducted, for each of the considered classification schemes, using both the original classifiers and the equivalent random classifiers, which assign compounds arbitrarily to classes with a probability that is proportional to the size of the classes (Table II for the statistical significance of these differences between the predictors and their random counterparts). The averaged accuracy of the five iterations of the cross-validation experiment and the corresponding standard deviation are represented. Note that the dataset was extracted from UMBBD, which is overrepresented with biodegradable compounds. This makes the accuracy of the predictive scheme reflected in the figure (as well as the related specificities and sensitivities; Supplementary Figure S3) to be an underestimation of the actual prognostic power of the system for new chemicals.
Mentions: To authenticate the significance of the figures generated above and uncover possible biases of the dataset on the predictive value of the classifiers, we compared the performance of our c4.5-based system with one employing random predictors. Possible biases due to overrepresentation of classes were corrected by having such predictors assigning compounds randomly to one of the two fates for each classification scheme, with a probability that was proportional to the population of each of the classes. Figure 4 shows the average accuracy obtained for the different classes (versus their negated classes) for the real and the randomised dataset. It can be seen that the real dataset produces a considerably higher accuracy for all the classes, which is more pronounced in the CM and NB groups. Details of the analyses of the randomised dataset (sensitivity or specificity) are shown in the Supplementary Figure S3. To assess the statistical significance of the differences between the c4.5-based system and the random predictors, the dataset was subject to a sign test. This analysis compares the performance of two methods based on the number of cases that one of them provides a correct response, whereas the other fails and vice versa. A P(N) is thereby obtained which can be interpreted as the probability for the –hypothesis, that is that the observed differences are happening by chance. These probabilities are shown in Table II. The result clearly demonstrates the superiority of the c4.5-based predictors as compared to their equivalent random counterparts with values of P(N) in the order 10−16–10−39.

Bottom Line: A machine learning approach has been instrumental to expose a correlation between the frequency of 149 atomic triads (chemotopes) common in organo-chemical compounds and the global capacity of microorganisms to metabolise them.Depending on the type of environmental fate defined, the system can correctly predict the biodegradative outcome for 73-87% of compounds.The application of this predictive tool to chemical species released into the environment provides an early instrument for tentatively classifying the compounds as biodegradable or recalcitrant.

View Article: PubMed Central - PubMed

Affiliation: Centro de Astrobiología (INTA-CSIC), Ctra. Torrejón Ajalvir, Km 4. Torrejón de Ardoz, Madrid, Spain.

ABSTRACT
The production of new chemicals for industrial or therapeutic applications exceeds our ability to generate experimental data on their biological fate once they are released into the environment. Typically, mixtures of organic pollutants are freed into a variety of sites inhabited by diverse microorganisms, which structure complex multispecies metabolic networks. A machine learning approach has been instrumental to expose a correlation between the frequency of 149 atomic triads (chemotopes) common in organo-chemical compounds and the global capacity of microorganisms to metabolise them. Depending on the type of environmental fate defined, the system can correctly predict the biodegradative outcome for 73-87% of compounds. This system is available to the community as a web server (http://www.pdg.cnb.uam.es/BDPSERVER). The application of this predictive tool to chemical species released into the environment provides an early instrument for tentatively classifying the compounds as biodegradable or recalcitrant. Automated surveys of lists of industrial chemicals currently employed in large quantities revealed that herbicides are the group of functional molecules more difficult to recycle into the biosphere through the inclusive microbial metabolism.

Show MeSH