6533b825fe1ef96bd128282e

RESEARCH PRODUCT

Optimization of curation of the dataset with data on repeated dose toxicity

Falko PartoschUrsula Gundert-remyA. BitschMadeleine SeelandStefan KramerMonika BatkeMartin Gütlein

subject

business.industryToxicityMedicineGeneral MedicineToxicologyBioinformaticsbusiness

description

Introduction: For some areas of risk assessment, the use of alter-native methods is supported by current directives and guidance(e.g. REACH, Cosmetics, BPD, PPP). According to OECD principles alternative methods need to be scientifically valid. Methods: Within a project on grouping and development of predictive models sup-ported by a grant of Federal Ministry of Education and Research, we curated a dataset based on RepDose and ELINCS database. The final dataset consists of rat repeated dose toxicity studies for 1022 com-pounds representing 28 endpoints as organ-effect-combinations. Toxicological and modelling experts did jointly the curation and selection of endpoints as an iterative process. Results: Missing values for endpoints of the dataset were the main problem to be handled. Endpoints such as thyroid gland contain specific information in contrast to unspecific endpoints such as liver/body weight. Unfortunately, for specific endpoints data is often missing (>90%) in the dataset. Several attempts were made to fill the data gaps. Finally, a statistic imputation procedure gave best results for grouping and modelling. We decided to include endpoints in the dataset only if the number of data points were sufficient to make precise predictions. The toxicological profile of a substance is determined not only by the affected endpoints but also by the potency. Hence, we decided to include the information on the potency as measured by LOELs. We explored several statistical procedures. Best results were obtained by an equal frequency distribution of LOEL values per endpoint. Smiles codes for substructures and reactive groups were selected as structural representation of the substances. The final evaluation showed that this structural description has major impact on the outcome of the grouping, e.g. ethylene glycols and aliphatic alcohols were not separated due to missing specific Smiles codes. By grouping and modelling the dataset it was experienced that specific physico-chemical parameters should be included in order to gain toxicological meaningful results. Discussion: Overall, within this project, we were able to show the impact of a carefully cured dataset on the usefulness and quality of grouping and predictive models build upon this dataset. For further information seehttp://mlc-reach.informatik.uni-mainz.de

https://doi.org/10.1016/j.toxlet.2015.08.566