Extraction of text summary using latent semantic indexing and information retrieval technique : comparison of four strategies
In EGC 2004, vol. RNTI-E-2, pp.453-464
In this paper, we present four generic text summarization techniques. Each technique extracts a text summary by ranking and extracting sentences from an original document. The first method, SUMMARIZER 1, uses standard information retrieval (IR) methods to rank sentences. The second method, SUMMARIZER 2, uses the Latent Semantic Analysis (LSA) technique to identify semantically important sentences, for summary creations. The third method, SUMMARIZER 3, uses a combination of the latent semantic analysis technique, reduction and relevance measure. The fourth method simply uses the TF*IDF (Term frequency * Inverse Document frequency) weighting scheme. Evaluations of the four methods are conducted using Document Understanding Conferences (DUC) datasets from NIST. We have compared the summary of each method with the manual summaries. Summarizer 4, with its lowest overhead, has comparable performance to summarizer 1. Analysis shows that a combination of LSA technique and the relevance measure (Summarizer 3) has the best performance on an average.