Limits...
An open source multistep model to predict mutagenicity from statistical analysis and relevant structural alerts.

Ferrari T, Gini G - Chem Cent J (2010)

Bottom Line: A cascade model has been developed and validated on a large public set of molecular structures and their associated Salmonella mutagenicity outcome.The first step consists in the derivation of a statistical model and mutagenicity prediction, followed by further checks for specific structural alerts in the "safe" subset of the prediction outcome space.The input is simply a file of molecular structures and the output is the classification result.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Electronics and Information (DEI), Politecnico di Milano via Ponzio, 34/5 - 20133 Milano, Italy. tferrari@elet.polimi.it

ABSTRACT

Background: Mutagenicity is the capability of a substance to cause genetic mutations. This property is of high public concern because it has a close relationship with carcinogenicity and potentially with reproductive toxicity. Experimentally, mutagenicity can be assessed by the Ames test on Salmonella with an estimated experimental reproducibility of 85%; this intrinsic limitation of the in vitro test, along with the need for faster and cheaper alternatives, opens the road to other types of assessment methods, such as in silico structure-activity prediction models.A widely used method checks for the presence of known structural alerts for mutagenicity. However the presence of such alerts alone is not a definitive method to prove the mutagenicity of a compound towards Salmonella, since other parts of the molecule can influence and potentially change the classification. Hence statistically based methods will be proposed, with the final objective to obtain a cascade of modeling steps with custom-made properties, such as the reduction of false negatives.

Results: A cascade model has been developed and validated on a large public set of molecular structures and their associated Salmonella mutagenicity outcome. The first step consists in the derivation of a statistical model and mutagenicity prediction, followed by further checks for specific structural alerts in the "safe" subset of the prediction outcome space. In terms of accuracy (i.e., overall correct predictions of both negative and positives), the obtained model approached the 85% reproducibility of the experimental mutagenicity Ames test.

Conclusions: The model and the documentation for regulatory purposes are freely available on the CAESAR website. The input is simply a file of molecular structures and the output is the classification result.

No MeSH data available.


Related in: MedlinePlus

The architecture of the integrated mutagenicity model: cascading filters.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2913329&req=5

Figure 1: The architecture of the integrated mutagenicity model: cascading filters.

Mentions: To achieve a tool more suitable for regulatory purposes, a mutagenicity classifier has been arranged integrating two different techniques: a machine learning algorithm from the Support Vector Machines (SVM) collection, to build an early model with the best statistical accuracy, then an ad hoc expert system based on known structural alerts (SAs), tailored to refine its predictions. The purpose is to prevent hazardous molecules misclassified in first instance (false negatives) from being labelled as safe. The resultant classifier can be presented as a cascading filters system (see Figure 1): compounds evaluated as positive by SVM are immediately labelled mutagenic, whereas the presumed negatives are further sifted through two consecutive checkpoints for SAs with rising sensitivity. The first checkpoint (12 SAs) has the chance to enhance the prediction accuracy by attempting a precise isolation of potential false negatives (FNs); the second checkpoint (4 SAs) proceeds with a more drastic (but more prudent) FNs removal, as much as this doesn't noticeably downgrade the original accuracy by generating too many false positives (FPs) as well. To reinforce this distinction, compounds filtered out by the first checkpoint are labelled mutagenic while those filtered out by the second checkpoint are labelled suspicious: this label is a warning that denotes a candidate mutagen, since it has fired a SA with low specificity. Unaffected compounds that pass through both checkpoints are finally labelled non-mutagenic.


An open source multistep model to predict mutagenicity from statistical analysis and relevant structural alerts.

Ferrari T, Gini G - Chem Cent J (2010)

The architecture of the integrated mutagenicity model: cascading filters.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2913329&req=5

Figure 1: The architecture of the integrated mutagenicity model: cascading filters.
Mentions: To achieve a tool more suitable for regulatory purposes, a mutagenicity classifier has been arranged integrating two different techniques: a machine learning algorithm from the Support Vector Machines (SVM) collection, to build an early model with the best statistical accuracy, then an ad hoc expert system based on known structural alerts (SAs), tailored to refine its predictions. The purpose is to prevent hazardous molecules misclassified in first instance (false negatives) from being labelled as safe. The resultant classifier can be presented as a cascading filters system (see Figure 1): compounds evaluated as positive by SVM are immediately labelled mutagenic, whereas the presumed negatives are further sifted through two consecutive checkpoints for SAs with rising sensitivity. The first checkpoint (12 SAs) has the chance to enhance the prediction accuracy by attempting a precise isolation of potential false negatives (FNs); the second checkpoint (4 SAs) proceeds with a more drastic (but more prudent) FNs removal, as much as this doesn't noticeably downgrade the original accuracy by generating too many false positives (FPs) as well. To reinforce this distinction, compounds filtered out by the first checkpoint are labelled mutagenic while those filtered out by the second checkpoint are labelled suspicious: this label is a warning that denotes a candidate mutagen, since it has fired a SA with low specificity. Unaffected compounds that pass through both checkpoints are finally labelled non-mutagenic.

Bottom Line: A cascade model has been developed and validated on a large public set of molecular structures and their associated Salmonella mutagenicity outcome.The first step consists in the derivation of a statistical model and mutagenicity prediction, followed by further checks for specific structural alerts in the "safe" subset of the prediction outcome space.The input is simply a file of molecular structures and the output is the classification result.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Electronics and Information (DEI), Politecnico di Milano via Ponzio, 34/5 - 20133 Milano, Italy. tferrari@elet.polimi.it

ABSTRACT

Background: Mutagenicity is the capability of a substance to cause genetic mutations. This property is of high public concern because it has a close relationship with carcinogenicity and potentially with reproductive toxicity. Experimentally, mutagenicity can be assessed by the Ames test on Salmonella with an estimated experimental reproducibility of 85%; this intrinsic limitation of the in vitro test, along with the need for faster and cheaper alternatives, opens the road to other types of assessment methods, such as in silico structure-activity prediction models.A widely used method checks for the presence of known structural alerts for mutagenicity. However the presence of such alerts alone is not a definitive method to prove the mutagenicity of a compound towards Salmonella, since other parts of the molecule can influence and potentially change the classification. Hence statistically based methods will be proposed, with the final objective to obtain a cascade of modeling steps with custom-made properties, such as the reduction of false negatives.

Results: A cascade model has been developed and validated on a large public set of molecular structures and their associated Salmonella mutagenicity outcome. The first step consists in the derivation of a statistical model and mutagenicity prediction, followed by further checks for specific structural alerts in the "safe" subset of the prediction outcome space. In terms of accuracy (i.e., overall correct predictions of both negative and positives), the obtained model approached the 85% reproducibility of the experimental mutagenicity Ames test.

Conclusions: The model and the documentation for regulatory purposes are freely available on the CAESAR website. The input is simply a file of molecular structures and the output is the classification result.

No MeSH data available.


Related in: MedlinePlus