RNTI

MODULAD
Étude lexicographique de sous-graphes pour l'élaboration de modèles structures à activité – cas de la chimie organique
In EGC 2019, vol. RNTI-E-35, pp.303-308
Abstract
The development of structure-activity models (QSAR) consists in being able to extract use- ful information in observations relating to molecular structures, in order to associate structural elements with an macroscopic activity. A typical example is that of organic chemistry, where certain physico-chemical properties of a molecule are a function of its internal arrangement (conformation). In particular, we find characteristic substructures, called functional groups or fragments that are similar to subgraphs, as well as structural links. We describe in this paper a distributional analysis of these fragments and show that they follow approximately power laws, close to the Zipf laws well known for natural languages. Pursuing this analogy, we de- velop the concept of "fragment-embedding" that we evaluate on classification/regression tasks by comparing our results to traditional "bag-of-fragments" approaches. We show the interest of this concept and deduce some perspectives.