Download PDFOpen PDF in browserCurrent version

Explainable AI approach towards Toxic Comment Classification

EasyChair Preprint no. 2773, version 1

Versions: 12history
24 pagesDate: February 26, 2020


In this paper, we have documented results and discussed our approach towards solving a serious problem that people face online (i.e. on internet) which is encountering “toxic”,“abusive”,“inappropriate” and “offensive” content in the form of textual comments on social media. The purpose of taking up this project topic is to stop “cyber bullying” and having a safe online environment. The methodology followed includes data collection from online resources, data preprocessing, converting textual data to vectors (TF-IDF,Word Embeddings), building machine learning and deep learning models, comparing the models using standard metrics as well as interpretability techniques, and thus selecting the best model. After training and evaluating various models, we have come up with a conclusion that standard model evaluation metrics (such as accuracy, precision, recall, f1-score) can often be deceiving as almost every single model we trained gave very good accuracy scores on the test set. After using a model-interpretability technique like LIME and checking out some of the explanations that the models generated on a common set of comments we created manually, we noticed that some of the models were considering incorrect words for a sentence to be predicted as toxic. So even with a high accuracy or any evaluation score, we can’t deploy such models in real world scenarios. The combination that gave us the best result in our study is Gated Recurrent Unit (GRU) + word embeddings with a high accuracy score along with intuitive LIME explanations. This paper provides a comparative study of various machine learning and deep learning models in Toxic Comment Classification. Through this project and study we also want to emphasize that model interpretability techniques (like LIME,etc.) are pivotal while selecting the best model for any ML/DL project and solutions as well as establishing the trust of the end user on the deployed model. 

Keyphrases: deep learning, Explainable AI, interpretability, Lime, machine learning, Toxic Comment Classification

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
  author = {Aditya Mahajan and Divyank Shah and Gibraan Jafar},
  title = {Explainable AI approach towards  Toxic Comment Classification},
  howpublished = {EasyChair Preprint no. 2773},

  year = {EasyChair, 2020}}
Download PDFOpen PDF in browserCurrent version