Download PDFOpen PDF in browserComparative Analysis of Machine Learning Algorithms for Heart Disease Prediction Using CDC and BRFSS Data: a Focus on Oversampling and Ensemble Techniques.EasyChair Preprint 1570610 pages•Date: January 13, 2025AbstractHeart disease remains one of the leading causes of mortality globally, and early detection is critical for effective treatment and prevention. This study presents a comparative analysis of various machine learning algorithms for heart disease prediction using the Behavioral Risk Factor Surveillance System (BRFSS) dataset, sourced from the Centers for Disease Control and Prevention (CDC) via Kaggle. Several predictive models were developed and assessed, including Logistic Regression, Random Forest, Decision Trees, K-Nearest Neighbor, Support Vector Machines, and ensemble techniques such as Bagging and Stacking. To address the imbalance in heart disease classification, the Synthetic Minority Oversampling Technique (SMOTE) was applied, leading to improvements in model performance. Models were evaluated using metrics such as Accuracy, Precision, Recall, F1-Score, and Receiver Operating Characteristic (ROC) Area. Notably, the XGBoost model showed superior performance with an F1-Score of 0.80 and an ROC area of 0.92. The application of SMOTE further enhanced the detection of minority cases, contributing to more balanced and robust predictions across models. This study demonstrates the potential of machine learning and oversampling techniques in improving the early detection of heart disease, providing healthcare professionals with enhanced tools for timely diagnosis and intervention. The integration of multiple algorithms and the use of ensemble techniques present a strong framework for predictive modeling in healthcare. Keyphrases: Ensemble Techniques, Predictive Modeling., Random Forest, XGBoost, classification algorithms, logistic regression
|