Download PDFOpen PDF in browser

Defining discourse formulae: computational approach

9 pagesPublished: March 18, 2019

Abstract

In this paper, we address the problem of automatic extraction of discourse formulae. By discourse formulae (DF) we mean a special type of constructions at the discourse level, which have a fixed form and serve as a typical response in the dialogue. Unlike traditional constructions [4, 5, 6], they do not contain variables within the sequence; their slots can be found in the left-hand or right-hand statements of the speech act. We have developed the system that extracts DF from drama texts. We have compared token-based and clause- based approaches and found the latter performing better. The clause-based model involves a uniform weight vote of four classifiers and currently shows the precision of 0.30 and the recall of 0.73 (F1-score 0.42).The created module was used to extract a list of DF from 420 drama texts of XIX-XXI centuries [1, 7]. The final list contains 3000 DF, 1800 of which are unique. Further development of the project includes enhancing the module by extracting left context features and applying other models, as well as exploring what DF concept looks like in other languages.

Keyphrases: construction grammar, discourse formulae, entity extraction, machine learning, natural language processing

In: Gerhard Wohlgenannt, Ruprecht von Waldenfels, Svetlana Toldova, Ekaterina Rakhilina, Denis Paperno, Olga Lyashevskaya, Natalia Loukachevitch, Sergei O. Kuznetsov, Olga Kultepina, Dmitry Ilvovsky, Boris Galitsky, Ekaterina Artemova and Elena Bolshakova (editors). Proceedings of Third Workshop "Computational linguistics and language science", vol 4, pages 61-69.

BibTeX entry
@inproceedings{CLLS2018:Defining_discourse_formulae_computational,
  author    = {Ekaterina Gerasimenko and Svetlana Puzhaeva and Elena Zakharova and Ekaterina Rakhilina},
  title     = {Defining discourse formulae: computational approach},
  booktitle = {Proceedings of Third Workshop "Computational linguistics and language science"},
  editor    = {Gerhard Wohlgenannt and Ruprecht von Waldenfels and Svetlana Toldova and Ekaterina Rakhilina and Denis Paperno and Olga Lyashevskaya and Natalia Loukachevitch and Sergei O. Kuznetsov and Olga Kultepina and Dmitry Ilvovsky and Boris Galitsky and Ekaterina Artemova and Elena Bolshakova},
  series    = {EPiC Series in Language and Linguistics},
  volume    = {4},
  publisher = {EasyChair},
  bibsource = {EasyChair, https://easychair.org},
  issn      = {2398-5283},
  url       = {/publications/paper/5tbr},
  doi       = {10.29007/k5q2},
  pages     = {61-69},
  year      = {2019}}
Download PDFOpen PDF in browser