REDIRE : Réduction Extrême de DImension pour le Résumé Extractif
In EGC 2024, vol. RNTI-E-40, pp.199-206
This paper presents an unsupervised automatic summarization model capable of extracting the most important sentences from a corpus. To extract sentences in a summary, we use pre-entrained word embeddings to represent the documents. From this thick cloud of word vectors, we apply an extreme dimension reduction to identify important words, which we group by proximity. Sentences are extracted using linear optimization to maximize the information present in the summary. We evaluate the approach on large documents and present very encouraging initial results.