I am currently a PhD student working in computational sciences and more specifically on machine learning to predict toxicity of chemical compounds.
I am also teaching computer sciences within the Informatics department at the University Côte d’Azur.
PhD work:Evaluation of the use of public toxicological data for chemical hazard prediction through computational methods.
The assessment of the toxic risk that chemicals can cause to humans and the environment is mandatory for the marketing of new chemical compounds. This evaluation is highly regulated and is carried out in particular through in vivo studies performed in different laboratory animal species and during different periods of exposure. However, these studies are costly in term of time, money and animal use and therefore unsuitable for the evaluation of thousands of compounds. This is why alternative solutions are considered in order to assess the toxic potential of chemical compounds as early as possible. To do so, different types of data are considered, such as the chemical structure of compounds, in vitro assay results and data from in vivo studies.
The objective of my work is to develop and adapt machine learning methods to the publicly available toxicology data in order to establish links between the different types of data. The ideal would be to predict effects observed in long-term in vivo studies, directly from the chemical structure of compounds. Nonetheless, this long-term prediction is too ambitious because of the high level of biological complexity and variability and because toxicity can results from a long chain of causality involving multiple pathways at different biological levels. Therefore, the objective is split into two sub-tasks involving machine learning models:
- The prediction of in vitro bioactivity from compounds’ chemical structure;
- The prediction of in vivo effects from compounds’ in vitro bioactivity.
- Replace missing data by machine learning predictions, without performing in vitro and in vivo studies;
- Chain the two types of machine learning models to predict in vivo effects from the chemical structure, with an intermediate prediction of in vitro bioactivity.
In a first study, we built machine learning models to predict the in vitro bioactivity of compounds based on their chemical structure. Since the learning data were highly imbalanced in favor of inactive compounds, we showed that the use of data augmentation techniques could improve models’ performance. Moreover, the usage of a large number of observations also contributes to the increase of performance.
Then, we performed a large scale study to predict all the in vitro assays available in the ToxCast database in order to determine the most appropriate machine leaning methods for this type of data. In particular, we showed that the ensemble method called “Stacked Generalization” led to really better performance compared to simple methods. In this study we also show that reliability in predictions could be evaluated after estimation of the applicability domain.
For the second task, we evaluated the relation between results from in vitro assays targeting pathways known to induce endocrine effects and toxic effects observed in several endocrine organs after long-term in vivo studies. We highlighted that, unexpectedly, these assays are not predictive of the in vivo effects, which raises the crucial question of the relevance of in vitro assays. This also demonstrated that machine learning cannot be used without a good understanding of biological phenomena.
Overall, our work points out the importance of developing computational methods adapted to the specificities of toxicological data. Moreover, it seems necessary to integrate other types of information in these methods such as mechanistic knowledge regarding the biological pathways leading to toxicity and pharmacokinetics data which inform about the fate of compounds in the whole organism. Such information would then help algorithms to find more relevant solutions.
I. Grenet; K. Merlo; J.-P. Comet; R. Tertiaux; D. Rouquié; F. Dayan. (2019). Stacked Generalization with Applicability Domain Outperforms simple QSAR on in vitro Toxicological Data. Journal of Chemical Information and Modelling.
I. Grenet; J.-P. Comet; F. Schorsch; N. Rayan; J. Wichard; D. Chemical in vitro bioactivity profiles are not informative about the long-term in vivo endocrine mediated toxicity. (Under review)
Other selected communications
I. Grenet; J.-P. Comet. How in vitro data can contribute to in vivo toxicity prediction using Machine Learning ? Poster - Journées Ouvertes en Biologie, Informatique et Mathématiques 2018 (Marseille - 03-06.07.18).
I. Grenet; J.-P. Comet; F. Porée; D. Rouquié. Toward In Vivo Toxicity Prediction from Molecular Structures: A Two-Stage Machine Learning Approach. Poster - Society of Toxicology 2018 (San Antonio, USA - 11-15.03.18).