Download PDFOpen PDF in browser

PUNER - Parsi ULMFiT for Named-Entity Recognition in Persian Texts

EasyChair Preprint 4224

12 pagesDate: September 21, 2020

Abstract

Abstract. Named Entity Recognition (NER) is an information extraction technique to identify and classify named entities automatically in any natural language text. These entities are predefined and generic like name of location, organization, date, time, etc., or they can be very specific like the example with the resume. NER is a key component in many Natural Language Processing systems, such as question answering, information retrieval, relation extraction, etc. Applications of NER include extracting important named entities from various texts such as academic, news and medical documents, content classification for news providers, improving the search algorithms, etc. Most of the NER research works explored is for high resource languages such as English, German and Spanish. However, very less NER related work is done in low-resource languages such as Persian, Indian, and Vietnamese due to lack of or less annotated corpora available for these languages. Among the mentioned languages very few works have been reported for the Persian language NER till now. Hence, this paper presents PUNER – a Persian NER system using Transfer Learning (TL) model that makes use of Universal Language Model Fine-tuning for NER in Persian language. This is accomplished by training a language model on Persian wiki text for identifying and extracting named entities from the given Persian texts. The proposed model is compared with the conventional Machine Learning (ML) models and Deep Learning (DL) models using BiLSTM by ap-plying five word embedding models namely, Fasttext, HPCA, Skipgram, Glove, and COBOW. All the models are evaluated on two Persian NER datasets and the results illustrate that TL model performs better than ML and DL models.

Keyphrases: Bidirectional LSTM, NLP, Named Entity Recognition, Persian, Transfer Learning, ULMFiT, deep learning, machine learning

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:4224,
  author    = {Fazlourrahman Balouchzahi and H. L. Shashirekha},
  title     = {PUNER - Parsi ULMFiT for Named-Entity Recognition in Persian Texts},
  howpublished = {EasyChair Preprint 4224},
  year      = {EasyChair, 2020}}
Download PDFOpen PDF in browser