Download PDFOpen PDF in browserSentiment Classification of Chinese Financial ReviewsEasyChair Preprint 5386 pages•Date: September 28, 2018AbstractIn order to explore the method of sentiment classification of Chinese text, this paper takes the Chinese comments of financial sector as the main research object and carries out text sentiment classification tasks. This paper proposes to initialize the word embedding with dictionary, to solve the problem that some words have opposite sentimental trendencies but similar distributed representation. Three commonly used evaluation indexs and their average value(AvgScore), the accuracy of classifier, the average recall rate and MacroF1, were given to evaluate the model. Through comparative analysis to explore the factors that influence the effect of classifier, this paper uses CNN, LSTM, GRU as the substructer and constructs a total of 9 models with different structures and depths for comparative study. Based on the top three models on AvgScore, this paper studies the method of initializing the word embedding with sampling and random perturbation technique. The results show that the sampling technique has the greatest impact on the classifier effect. Among the different sampling techniques, AvgScore difference by 1.1% to 38.3%. The best results can be obtained from the mix use of downsampling technique in majority and few oversampling technique. The classifier obtained by word embedding with sentiment dictionary is better than using other word vectors. Its highest accuracy rate, MacroF1, AvgScore respectively are 82.37%, 77.26% and 77.62%, and the other highest are 82.19%, 76.73%, 77.08%. In this paper, the top three classifiers with the highest AvgScore are selected to build the classifier in ensemble approach. The accuracy of the final ensemble method classifier is 84.00%, the average recall is 74.58%, the Macro F1 is 79.50%, and the AvgScore is 79.36%. Keyphrases: ensemble methods, imbalanced sample classification, text classification, word embedding initialization method
|