Statistics

 

1. Statistics: ProTox-II acute toxicity training set

 

Our training set consists of an in-house database of approximately 40000 compounds for which LD50 values have been determined in mouse or rat experiments. For many compounds, LD50 values have been determined in multiple experiments or in different species and therefore, multiple LD50 values can exist for one compound.

1.1 Acute toxicity classes

Toxicity classes can be defined in different manners because multiple measurements can be available for one compound. In our project, we mainly focus on toxicity classes defined from the minimum of multiple measurements, therefore possibly considering compounds as more toxic. The distribution of toxicity classes in our training set is the following:

top of page

1.2 Fragments

Molecules can be broken down into fragments using different methods. We have analyzed the fragmentation of the training set using RECAP and ROTBOND rules from an in-house method (Fragment Store). A fragmentation of the molecules in our dataset shows that certain toxicity classes contain more distinct fragments than others:

top of page

2. Statistics: Cross-Validation results for Rodent Acute Toxicity

To validate ProTox-II acute toxicity model, we have performed leave-one-out cross-validation. The prediction parameters of ProTox-II have been optimized to maximize the hit rates of the toxicity class and LD50 prediction. Furthermore, the hit rates were compared to the prediction rate of TOPKAT® , a commercial tool for toxicity prediction (Accelrys).

The prediction of toxicity classes based on the minimum dose (if multiple doses for a compound were present) revealed that ProTox-II outperforms TOPKAT® regarding predictions of all toxicity classes:

top of page

3. Statistics: Cross-validation results for Organ toxicity, Toxicity endpoints, Toxicological pathways

To validate ProTox-II models for organ toxicity, toxicity endpoints (carninogenicity, mutagenecity, cytotoxicity), and toxicological pathways we have performed fragment propensity based CLUSTER cross-validation and selective oversampling method. The chemical space of the training sets were divided into 10 parts based on the fragment propensities, for 10 fold cross-validation. The prediction parameters of ProTox-II models were optimized to reduce the gap between sensitivity and specificity. Below, the balance accuracy of the best models are reported along with AUC-ROC and Kappa values. Furthermore, the mean sensitivity and specificity are also reported (mean values of the 10 models). Exception: The immunotoxicity model was validated based on the methodology reported in the paper Schrey.et.al (2017)

top of page

4. Statistics: External validation results for Organ toxicity, Toxicity endpoints, Toxicological pathways

All the external validation of the ProTox-II models for organ toxicity, toxicity endpoints (carninogenicity, mutagenecity, cytotoxicity), and toxicological pathways have been performed by 10 different trained models using relatively different chemical space based on fragment propensity based CLUSTER cross-validation. Below, the balance accuracy of the best models (respective endpoints) are reported along with AUC-ROC and Kappa values. Furthermore, the mean sensitivity and specificity are also reported (mean values of the 10 predictive models).

top of page