Mean-shift : Clustering scalable et distribué
Abstract
We introduce an efficient distributed implementation of nearest neighbour mean shift
clustering (NNMS). The computationally intensive nature of NNMS has so far restricted its
application to complex data sets where a flexible clustering with non-ellipsoidal clusters would
be beneficial. A parallel implementation of the standard serial NNMS algorithm on its own
brings insufficient performance gains so we introduce two further algorithmic improvements: a
normal scale (NS) choice of the optimal number of nearest neighbours, and locality sensitive
hashing (LSH) to approximate nearest neighbour searches. Combining these improvements into
a single distributed algorithm DNNMS offers the potential for an efficient method for Big Data
Clustering.