Apprentissage d'embeddings de codes pour l'enseignement de la programmation : une approche fondée sur l'analyse des traces d'exécution

Guillaume Cleuziou, Frédéric Flouvat

In EGC 2021, vol. RNTI-E-37, pp.107-118

Abstract

Improving the pedagogical effectiveness of programming training platforms is a hot topic that requires the construction of fine and exploitable representations of learners' programs. This article presents a new approach for learning program embeddings. Starting from the hypothesis that the functionality of a program, but also its "style", can be captured by analyzing these traces of executions, the code2aes2vec method proceeds in two steps. A first step generates abstract execution sequences (AES) from running tests and abstract syntax trees (AST) of the submitted programs. The doc2vec method is then used to learn condensed vector representations (embeddings) of the programs from these AESs. This contribution also leads to the exploitation and diffusion of new real data sets. A first evaluation performed on these data sets shows that the embeddings generated by code2aes2vec seem to efficiently capture the semantics and even the style of the programs.

Preview See bibtex

Download