Download PDFOpen PDF in browserTwitter Sentiment AnalysisEasyChair Preprint 606517 pages•Date: July 13, 2021AbstractIn this report, address the problem of sentiment classification on twitter dataset. used a number of machine learning and deep learning methods to perform sentiment analysis. In the end, used a majority vote ensemble method with 5 of our best models to achieve the classification accuracy of 83.58% on kaggle public leaderboard. compared various different methods for sentiment analysis on tweets (a binary classification problem). The training dataset is expected to be a CSV file of type tweet_id, sentiment, tweet where the tweet_id is a unique integer identifying the tweet, sentiment is either 1 (positive) or 0 (negative), and tweet is the tweet enclosed in "". Similarly, the test dataset is a CSV file of type tweet_id, tweet. Please note that CSV headers are not expected and should be removed from the training and test datasets. used Anaconda distribution of Python for datasets for library requirements specific to some methods such as keras with TensorFlow backend for Logistic Regression, MLP, RNN (LSTM), and CNN. and xgboost for XGBoost. Usage of preprocessing, baseline, Naive Bayes, Maximum entropy, Decision Tree, random forest, multi-layer perception etc are implemented. Keyphrases: CNN, LSTM, deep learning, machine learning, sentiment classification
|