Modèle de Biclustering dans un paradigme "Mapreduce"
Abstract
Biclustering is a main task in a variety of areas of machine learning providing simultaneous observations and features clustering. Biclustering approches are more complex compared to the traditional clustering particularly those requiring large dataset and Mapreduce platforms. We propose a new approach of biclustering based on popular self-organizing maps for cluster analysis of large dataset. We have designed scalable implementations of the new biclustering algorithm using MapReduce with the Spark platform. We report the experiments and demonstrated the performance public dataset using different cores. Using practical examples, we demonstrate that our algorithm works well in practice. The experimental results show scalable performance with near linear speedups across different data and 120 cores.