RNTI

MODULAD
Calcul haute performance en Python pour la Science des Données: une vue d'ensemble
In EGC 2024, vol. RNTI-E-40, pp.385-392
Abstract
Python has become the prime language for application development in the Data Science and Machine Learning domains. However, data scientists are not necessarily experienced programmers. While Python lets them quickly implement their algorithms, when moving at scale, computation efficiency becomes inevitable. Thus, harnessing high-performance devices such as multicore processors and Graphical Processing Units (GPU) to their potential is generally not trivial. In this paper, we convey the main outputs of a recent survey, thought as a reference document for such practitioners to help them make their way in the wealth of tools and techniques available for the Python language. We cast a special focus on delineating main traits and distinctive features in this field. This document can support Data Science practitioners in their tool choice, and tool developers in the identification of potential lacks in existing work.