Apprentissage d'embeddings de codes pour l'enseignement de la programmation : une approche fondée sur l'analyse des traces d'exécution
Abstract
Improving the pedagogical effectiveness of programming training platforms is a hot topic
that requires the construction of fine and exploitable representations of learners' programs.
This article presents a new approach for learning program embeddings. Starting from the hypothesis
that the functionality of a program, but also its "style", can be captured by analyzing
these traces of executions, the code2aes2vec method proceeds in two steps. A first step generates
abstract execution sequences (AES) from running tests and abstract syntax trees (AST)
of the submitted programs. The doc2vec method is then used to learn condensed vector representations
(embeddings) of the programs from these AESs. This contribution also leads to
the exploitation and diffusion of new real data sets. A first evaluation performed on these data
sets shows that the embeddings generated by code2aes2vec seem to efficiently capture the
semantics and even the style of the programs.