RNTI

MODULAD
Génération de données binaires groupées à partitionnement contrôlé et évaluation de l'impact des méthodes de réduction de dimension sur ce partitionnement
In EGC 2021, vol. RNTI-E-37, pp.341-348
Abstract
Binary data, data having two possible values, are widely used in several researches such as protein modelling in bioinformatics. Some problems involve clustering binary data. The availability of real data, to study the applicability of some algorithms to a given problem, is not always obvious. This issue is even more visible in the case of unsupervised learning, and for clustering problems. To resolve these issues, this paper proposes a new clustered binary data generation algorithm. Indeed, this algorithm generates clustered binary data through various parameters. These parameters are useful to generate data with known characteristics and controlled clusters. This article details a method of generating clustered binary data, and presents a comparison of the dimension reduction algorithms to show the effectiveness of the generated data in helping to choose a dimensionnality reduction algorithm that conserves clusters separability.