Download PDFOpen PDF in browserExplainable AI approach towards Toxic Comment ClassificationEasyChair Preprint 2773, version 224 pages•Date: February 27, 2020AbstractIn this paper, we have documented results and discussed our approach towards solving a serious problem that people face online (i.e. on internet) which is encountering “toxic”,“abusive”,“inappropriate” and “offensive” content in the form of textual comments on social media. The purpose of taking up this project topic is to stop “cyber bullying” and having a safe online environment. The methodology followed includes data collection from online resources, data preprocessing, converting textual data to vectors (TF-IDF,Word Embeddings), building machine learning and deep learning models, comparing the models using standard metrics as well as interpretability techniques, and thus selecting the best model. After training and evaluating various models, we have come up with a conclusion that standard model evaluation metrics (such as accuracy, precision, recall, f1-score) can often be deceiving as almost every single model we trained gave very good accuracy scores on the test set. After using a model-interpretability technique like LIME and checking out some of the explanations that the models generated on a common set of comments we created manually, we noticed that some of the models were considering incorrect words for a sentence to be predicted as toxic. So even with a high accuracy or any evaluation score, we can’t deploy such models in real world scenarios. The combination that gave us the best result in our study is Gated Recurrent Unit (GRU) + word embeddings with a high accuracy score along with intuitive LIME explanations. This paper provides a comparative study of various machine learning and deep learning models in Toxic Comment Classification. Through this project and study we also want to emphasize that model interpretability techniques (like LIME,etc.) are pivotal while selecting the best model for any ML/DL project and solutions as well as establishing the trust of the end user on the deployed model. Keyphrases: Explainable AI, Lime, Toxic Comment Classification, deep learning, interpretability, machine learning
|