Download PDFOpen PDF in browserDiabetes Prediction Using Machine Learning ApproachEasyChair Preprint 106868 pages•Date: August 7, 2023AbstractDiabetes is one of the most common diseases in the world, when detected early, it is possible to stop the progression of the disease and prevent further complications. In this work, we design a predictive model that predicts whether a patient will develop diabetes, based on certain diagnostic measures contained in the dataset, and explore different techniques to improve performance and accuracy. Logistic regression is the main algorithm used in this article and the analysis was performed using Python IDEs. The trial mainly uses two data sets one is the PIMA Indians Diabetes dataset, which is the source from the National Institute of Diabetes and Digestive and Kidney Diseases, and another dataset from Vanderbilt, based on a study of rural African Americans in Virginia. The selection of functions is done using two different methods. Aggregation methods are used in addition, which improves performance by producing better predictions against a single model. Accuracy and runtime are recorded for the original datasets and for those obtained later using feature selection and aggregation techniques. A comparison is also presented in each case. The highest accuracy obtained is about 78% for dataset 1, after using the aggregation technique - Maximum Voting; and it was around 93% for dataset 2, after using combined techniques: maximum polling and stacking. Logistic regression has been proven to be one of the effective algorithms for building predictive models. This study also shows that in addition to algorithm selection, there are other factors that can be improved model accuracy and runtime, such as preprocessing data, removing redundant and null values, normalization, cross-validation, feature selection, and the use of aggregation techniques. Keyphrases: Diabetes, learning, machine, prediction
|