Building a discourse parser for IMD

in the context of a controlled natural language 2013/03/24

Autores

Raúl Ernesto Gutiérrez de Piñerez Reyes, Juan Francisco Díaz Frías

Fecha de publicación

2013-03-24

Conferencia

International Conference on Intelligent Text Processing and Computational Linguistics, page 533-544

Editor

Springer, Berlin, Heidelberg

Abstract

The lack of specific data sets makes difficult the discourse parsing for Informal Mathematical Discourse (IMD). In this paper, we propose a data driven approach to identify arguments and connectives in an IMD structure within the context of Controlled Natural Language (CNL). Our approach follows a low-level discourse parsing under Peen Discourse TreeBank (PDTB) guidelines. Three classifiers have been trained: one that identifies the Arg2, other that locates the relative position of Arg1 and a third that identifies the (Arg1 and Arg2) arguments of each connective. These classifiers are instances of Support Vector Machines (SVMs), fed from an own Mathematical TreeBank. Finally, our approach defines an End-to-End discourse parser into IMD, whose results will be used to classify of informal deductive proofs via the low level discourse in IMD processing.

PDF