Twitter Sentiment Analysis

EasyChair Preprint 6065

17 pages•Date: July 13, 2021

Abstract

In this report, address the problem of sentiment classification on twitter dataset. used a number of machine learning and deep learning methods to perform sentiment analysis. In the end, used a majority vote ensemble method with 5 of our best models to achieve the classification accuracy of 83.58% on kaggle public leaderboard. compared various different methods for sentiment analysis on tweets (a binary classification problem). The training dataset is expected to be a CSV file of type tweet_id, sentiment, tweet where the tweet_id is a unique integer identifying the tweet, sentiment is either 1 (positive) or 0 (negative), and tweet is the tweet enclosed in "". Similarly, the test dataset is a CSV file of type tweet_id, tweet. Please note that CSV headers are not expected and should be removed from the training and test datasets. used Anaconda distribution of Python for datasets for library requirements specific to some methods such as keras with TensorFlow backend for Logistic Regression, MLP, RNN (LSTM), and CNN. and xgboost for XGBoost. Usage of preprocessing, baseline, Naive Bayes, Maximum entropy, Decision Tree, random forest, multi-layer perception etc are implemented.

Keyphrases: CNN, LSTM, deep learning, machine learning, sentiment classification

Links:

https://easychair.org/publications/preprint/26Xf

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:6065,
  author    = {Vedurumudi Priyanka},
  title     = {Twitter Sentiment Analysis},
  howpublished = {EasyChair Preprint 6065},
  year      = {EasyChair, 2021}}

Download PDF Open PDF in browser