CATI : une approche interactive de découverte et de classification de grands corpus de documents
Abstract
In this paper we present CATI, an approach for multi modal document classification implemented
on an assisted, interactive document collection manipulation web application. The
application helps non computer scientist users to discover, browse and classify large document
collections, where documents contain text and can come with images and metadata such as
timestamp, author, geolocation, etc. CATI provides classification assistants such as event detection,
text and image based document clustering. It comes with an interface that helps users
select among several text and other information based features to classify the documents. Our
study shows that using the classification assistants and helping users choose the right features
gives good classification results for large document collection within a few clicks.