Download PDFOpen PDF in browserA Machine Learning Approach for Diagnosing and Identifying Symptoms of COVID-19EasyChair Preprint 94316 pages•Date: December 7, 2022AbstractCoronavirus 2019 (COVID-19) is a pandemic that hit the world and was responsible for the death of millions and the life disruption of billions of people. One of the most critical challenges faced during the earlier breakthrough of the diseases was identifying symptoms confused with colds, flu, and other common infections. Nevertheless, despite all the effort and research conducted for this purpose, this challenge continues as more strains, variants, and mutations appear. This work presents a solution for this problem based on machine learning classification and variable importance algorithms. A public dataset of 274,957 cases has been classified into typical and COVID-19 cases based on the reported symptoms and other variables. The dataset was used for classifying the reported cases using K-nearest neighbour (KNN), Naïve Bayes, and Decision Trees (DT) algorithms and identifying the significant symptoms that were decisive in classifying the patients using Gini, Information Gain, and Information Gain Ratio algorithms. Naïve Bayes and Decision Trees performed best with a Classification Accuracy (CA) score of 95.2% and 96.3%, respectively. The Naïve Bayes classifier scored an Area Under the Curve (AUC) of 88.75%. In addition, the applied variable importance algorithms identified headache, fever, and sore throat as the most important symptoms. Keyphrases: COVID-19, Data Mining, Health Informatics, Medical Diagnosis, SARS-CoV-2, machine learning
|