Limits...
New public QSAR model for carcinogenicity.

Fjodorova N, Vracko M, Novic M, Roncaglioni A, Benfenati E - Chem Cent J (2010)

Bottom Line: In present paper models for classification of carcinogenic compounds using MDL and Dragon descriptors were developed.Models could be used to set priorities among chemicals for further testing.The models at the CAESAR site were implemented in java and are publicly accessible.

View Article: PubMed Central - HTML - PubMed

Affiliation: National Institute of Chemistry, Hajdrihova 19, SI-1001 Ljubljana, Slovenia. natalja.fjodorova@ki.si.

ABSTRACT

Background: One of the main goals of the new chemical regulation REACH (Registration, Evaluation and Authorization of Chemicals) is to fulfill the gaps in data concerned with properties of chemicals affecting the human health. (Q)SAR models are accepted as a suitable source of information. The EU funded CAESAR project aimed to develop models for prediction of 5 endpoints for regulatory purposes. Carcinogenicity is one of the endpoints under consideration.

Results: Models for prediction of carcinogenic potency according to specific requirements of Chemical regulation were developed. The dataset of 805 non-congeneric chemicals extracted from Carcinogenic Potency Database (CPDBAS) was used. Counter Propagation Artificial Neural Network (CP ANN) algorithm was implemented. In the article two alternative models for prediction carcinogenicity are described. The first model employed eight MDL descriptors (model A) and the second one twelve Dragon descriptors (model B). CAESAR's models have been assessed according to the OECD principles for the validation of QSAR. For the model validity we used a wide series of statistical checks. Models A and B yielded accuracy of training set (644 compounds) equal to 91% and 89% correspondingly; the accuracy of the test set (161 compounds) was 73% and 69%, while the specificity was 69% and 61%, respectively. Sensitivity in both cases was equal to 75%. The accuracy of the leave 20% out cross validation for the training set of models A and B was equal to 66% and 62% respectively. To verify if the models perform correctly on new compounds the external validation was carried out. The external test set was composed of 738 compounds. We obtained accuracy of external validation equal to 61.4% and 60.0%, sensitivity 64.0% and 61.8% and specificity equal to 58.9% and 58.4% respectively for models A and B.

Conclusion: Carcinogenicity is a particularly important endpoint and it is expected that QSAR models will not replace the human experts opinions and conventional methods. However, we believe that combination of several methods will provide useful support to the overall evaluation of carcinogenicity. In present paper models for classification of carcinogenic compounds using MDL and Dragon descriptors were developed. Models could be used to set priorities among chemicals for further testing. The models at the CAESAR site were implemented in java and are publicly accessible.

No MeSH data available.


Related in: MedlinePlus

Statistical performance of model with 8 MDL descriptors (model A) and dimension 35*35 depending on number of learning epochs*. *Optimal model corresponds to 800 learning epochs (accuracy of test set is equal to 0.73)
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2913330&req=5

Figure 1: Statistical performance of model with 8 MDL descriptors (model A) and dimension 35*35 depending on number of learning epochs*. *Optimal model corresponds to 800 learning epochs (accuracy of test set is equal to 0.73)

Mentions: A considerable amount of models have been built with dimensions 20*20; 25*25; 30*30; 35*35; 40*40 and number of learning epochs from 100 to 1800. Minimal correction factor was set at 0.01. Maximum correction factor was set at 0.5. The highest prediction power was obtained for models with dimension 35*35. Statistical performance of models with dimension 35*35 depending on number of learning epochs is represented on Figure 1 and Figure 2 for models with MDL (model A) and Dragon (model B) descriptors correspondingly. The highest accuracy for test set equal to 0.73 was obtained using 800 learning epochs for model A (see Figure). For model B the highest accuracy for test set was equal to 0.69 which corresponds to 200 learning epochs. Hence, optimal dimension for models with MDL and Dragon descriptors was set equal to 35 *35 neurons. The number of learning epochs was accepted 800 and 200 for optimal models with 8 MDL and 12 Dragon descriptors, correspondingly.


New public QSAR model for carcinogenicity.

Fjodorova N, Vracko M, Novic M, Roncaglioni A, Benfenati E - Chem Cent J (2010)

Statistical performance of model with 8 MDL descriptors (model A) and dimension 35*35 depending on number of learning epochs*. *Optimal model corresponds to 800 learning epochs (accuracy of test set is equal to 0.73)
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2913330&req=5

Figure 1: Statistical performance of model with 8 MDL descriptors (model A) and dimension 35*35 depending on number of learning epochs*. *Optimal model corresponds to 800 learning epochs (accuracy of test set is equal to 0.73)
Mentions: A considerable amount of models have been built with dimensions 20*20; 25*25; 30*30; 35*35; 40*40 and number of learning epochs from 100 to 1800. Minimal correction factor was set at 0.01. Maximum correction factor was set at 0.5. The highest prediction power was obtained for models with dimension 35*35. Statistical performance of models with dimension 35*35 depending on number of learning epochs is represented on Figure 1 and Figure 2 for models with MDL (model A) and Dragon (model B) descriptors correspondingly. The highest accuracy for test set equal to 0.73 was obtained using 800 learning epochs for model A (see Figure). For model B the highest accuracy for test set was equal to 0.69 which corresponds to 200 learning epochs. Hence, optimal dimension for models with MDL and Dragon descriptors was set equal to 35 *35 neurons. The number of learning epochs was accepted 800 and 200 for optimal models with 8 MDL and 12 Dragon descriptors, correspondingly.

Bottom Line: In present paper models for classification of carcinogenic compounds using MDL and Dragon descriptors were developed.Models could be used to set priorities among chemicals for further testing.The models at the CAESAR site were implemented in java and are publicly accessible.

View Article: PubMed Central - HTML - PubMed

Affiliation: National Institute of Chemistry, Hajdrihova 19, SI-1001 Ljubljana, Slovenia. natalja.fjodorova@ki.si.

ABSTRACT

Background: One of the main goals of the new chemical regulation REACH (Registration, Evaluation and Authorization of Chemicals) is to fulfill the gaps in data concerned with properties of chemicals affecting the human health. (Q)SAR models are accepted as a suitable source of information. The EU funded CAESAR project aimed to develop models for prediction of 5 endpoints for regulatory purposes. Carcinogenicity is one of the endpoints under consideration.

Results: Models for prediction of carcinogenic potency according to specific requirements of Chemical regulation were developed. The dataset of 805 non-congeneric chemicals extracted from Carcinogenic Potency Database (CPDBAS) was used. Counter Propagation Artificial Neural Network (CP ANN) algorithm was implemented. In the article two alternative models for prediction carcinogenicity are described. The first model employed eight MDL descriptors (model A) and the second one twelve Dragon descriptors (model B). CAESAR's models have been assessed according to the OECD principles for the validation of QSAR. For the model validity we used a wide series of statistical checks. Models A and B yielded accuracy of training set (644 compounds) equal to 91% and 89% correspondingly; the accuracy of the test set (161 compounds) was 73% and 69%, while the specificity was 69% and 61%, respectively. Sensitivity in both cases was equal to 75%. The accuracy of the leave 20% out cross validation for the training set of models A and B was equal to 66% and 62% respectively. To verify if the models perform correctly on new compounds the external validation was carried out. The external test set was composed of 738 compounds. We obtained accuracy of external validation equal to 61.4% and 60.0%, sensitivity 64.0% and 61.8% and specificity equal to 58.9% and 58.4% respectively for models A and B.

Conclusion: Carcinogenicity is a particularly important endpoint and it is expected that QSAR models will not replace the human experts opinions and conventional methods. However, we believe that combination of several methods will provide useful support to the overall evaluation of carcinogenicity. In present paper models for classification of carcinogenic compounds using MDL and Dragon descriptors were developed. Models could be used to set priorities among chemicals for further testing. The models at the CAESAR site were implemented in java and are publicly accessible.

No MeSH data available.


Related in: MedlinePlus