Reconnaissance de sections et d'entités dans les décisions de justice : application des modèles probabilistes HMM et CRF
Abstract
A court decision is a text document, which is a synthesis of the outcome of a court case.
Lawyers regularly use them as a source of interpretation of the law and also in order to un-
derstand the opinion of judges. The available huge quantity of decisions requires automated
solutions to help the actors of law. We propose to address some of the challenges related to the
search and the analysis of the growing set of court decisions in France in a larger project. The
first phase of this project focuses on extracting information from decisions in order to build a
jurisprudential knowledge base structuring and organizing decisions. Such a base facilitates the
descriptive and predictive analysis of decisions corpora. This paper presents an application of
probabilistic models for the zoning of decisions and the recognition of entities in their content
(location, date, participants, rules of law, ...). Our tests show the advantage of the approaches
based on Conditional Random Fields (CRF) compared to simpler and faster models based on
Hidden Markov Models (HMM). We present the technical aspects of the selection and annota-
tion of the training corpus, and the definition of discriminating descriptors. The specificity of
the texts is important and should be taken into account when applying information extracting
methods in a specific domain.