Download PDFOpen PDF in browserUnveiling Text Mining Potential: A Comparative Analysis of Document Classification Algorithms13 pages•Published: March 21, 2024AbstractThe importance of document classification has grown significantly in recent years, mostly due to the rise in digital data volumes. Since textual documents often contain more than 80% of all information, there is a perception that text mining has tremendous commercial potential. For future uses, knowledge extraction from these texts is essential. However, it is difficult to obtain this information due to the vast volume of files. As a re- sult, since text classification was introduced, the practice of classifying documents by text analysis has grown in significance. We have primarily employed three different algorithms to compare the metrics between them in order to assess the performance of various models. For this, the dataset was created by extracting condensed information from a variety of textbook genres, including business, social science, and computer science textbooks. To classify textbooks within the same subject group, we used three supervised machine learn- ing techniques in this study: decision trees, random forests, and neural networks. Among these three models, multilayer perceptron neural networks have performed and produced the best outcomes.Keyphrases: document categorization, machine learning, neural networks, text classification, text mining In: Ajay Bandi, Mohammad Hossain and Ying Jin (editors). Proceedings of 39th International Conference on Computers and Their Applications, vol 98, pages 103-115.
|