Download PDFOpen PDF in browserOffensive Text Detection: Exploring Traditional Classifiers, Ensemble Models, and Kolmogorov Arnold Networks in Code-Mixed Tamil-English TextEasyChair Preprint 1558113 pages•Date: December 16, 2024AbstractOffensive content has become more common in the digital era due to the growth of social media and online communication, especially in languages like Tamil. The challenges of detecting such harmful content are due to the large-scale labeled information scarcity and the intricacy of code-switching. The hybrid architecture for offensive text identification described in this paper combines the most beneficial aspects of Kolmogorov-Arnold Networks (KAN), traditional machine learning classifiers, and ensemble models. Our strategy involves preprocessing of text, several extracted features, and tuning of hyperparameters for better performance of the model. We explore many different classifier performances comprising XGBoost, AdaBoost, Gradient Boosting, K-Nearest Neighbours (KNN), Random Forest, Support Vector Machine (SVM), and Logistic Regression. Extensive trials show that our hybrid system, particularly leveraging KAN, emerges as the best model for precisely identifying objectionable material in Tamil-English datasets with mixed coding. To address the challenges of offensive content identification in multilingual and code- mixed contexts, the results demonstrate the potential benefits of integrating conventional and cutting-edge machine learning techniques. Keyphrases: Code-Mixed Text, Key Attention Networks, Kolmogorov-Arnold Networks, Linear SVC, Offensive Content, Offensive Language Detection, Offensive Text Detection, Tamil-English, accuracy precision recall, code mixed communication, code mixed tamil, code mixed tamil english data, hate speech detection, kolmogorov arnold networks kan, machine learning classifiers and ensemble models, neural networks, preprocessing and feature extraction methods, sentiment analysis and offensive language identification dataset, social media, traditional classifiers
|