Download PDFOpen PDF in browserFake News Classification Using Morphological Tag and N-GramsEasyChair Preprint 1105011 pages•Date: October 9, 2023AbstractToday, learning how to effectively identify fake news in social networks is a very important and urgent task. These methods are studied in many research areas, including morphological analysis. Some NLP researchers argue that simple content-related n-grams and POS tagging are insufficient to classify fake news. However, they have not received any empirical research results that could experimentally confirm these statements in the last decade. Considering this contradiction, the main goal of the paper is to experimentally evaluate the possibilities of general use of n-grams and POS tagging for correct classification of fake and real news. The n-grams of the POS tags of the corpus texts were identified and further analyzed. Three methods based on POS tagging of different groups of n-grams were proposed and applied in the preprocessing stage of fake news detection. For this purpose, the size of n-gram was checked first. Based on the detected n-grams, the optimal depth of the decision trees was determined for sufficient generalization. Finally, the performance of the models based on the proposed methods was compared with the standardized TF-IDF values. Performance indicators of the model, such as precision, recall and f1-score, were checked several times. Also, the question of whether the TF-IDF method can be improved using POS tagging was investigated in detail. The research results showed that the newly proposed method recorded more accurate results compared to the traditional TF-IDF technique. In conclusion, it can be said that morphological analysis can improve the basic TF-IDF method. Keyphrases: NLP, POS tegging, morphological analysis
|