Semantic Change Detection for the Romanian Language

EasyChair Preprint 10846, version 3

Versions: 1 23→history

8 pages•Date: October 2, 2023

Ciprian-Octavian Truică, Victor Tudose and Elena-Simona Apostol

Abstract

Automatic semantic change methods try to identify the changes that appear over time in the meaning of words by analyzing their usage in diachronic corpora. In this paper, we analyze different strategies to create static and contextual word embedding models, i.e., Word2Vec and ELMo, on real-world English and Romanian datasets. To test our pipeline and determine the performance of our models, we first evaluate both word embedding models on an English dataset (SEMEVAL-CCOHA). Afterward, we focus our experiments on a Romanian dataset, and we underline different aspects of semantic changes in this low-resource language, such as meaning acquisition and loss. The experimental results show that, depending on the corpus, the most important factors to consider are the choice of model and the distance to calculate a score for detecting semantic change.

Keyphrases: low-resource language, semantic change, word embeddings

Links:

https://easychair.org/publications/preprint/dlp3

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:10846,
  author    = {Ciprian-Octavian Truică and Victor Tudose and Elena-Simona Apostol},
  title     = {Semantic Change Detection for the Romanian Language},
  howpublished = {EasyChair Preprint 10846},
  year      = {EasyChair, 2023}}

Download PDF Open PDF in browser