De représentations de documents à programmes : l'hypothèse distributionnelle peut-elle vraiment être utilisée sur les langages de programmation?

Thibaut Martinet, Guillaume Cleuziou, Matthieu Exbrayat, Frédéric Flouvat

In EGC 2025, vol. RNTI-E-41, pp.427-434

Abstract

Many deep learning models have been applied to programming languages, all of them relying on natural language models and their underlying distributional hypothesis, but never questionning the relevance of this latter. In this paper we thus explore wether this hypothesis still stands for programming languages. Several methods are used, which we apply on variants of a well-known, easy to understand and to adapt model of natural language processing : doc2vec. Among other contributions, we propose a set of short programs that allow the observation of both syntactic and semantic analogies.

Preview See bibtex

Download