Download PDFOpen PDF in browserExploring Feature Selection Techniques and Property Tax Impact on Housing Prices: a Case StudyEasyChair Preprint 138055 pages•Date: July 3, 2024AbstractIn this study, we aim to enhance the predictive performance of house price estimations using the Boston housing dataset by employing advanced feature engineering techniques. We introduce several new features, including interaction terms and polynomial transformations, to capture non-linear relationships and interactions among the original variables. Specifically, we create features such as the interaction between nitric oxides concentration and distances to employment centers (NOX_DIS), the square of the average number of rooms per dwelling (RM2), the logarithm of the crime rate (LOG_CRIM), and the ratio of property tax to the number of rooms (TAX_RM). These new features are integrated into a multiple linear regression model to predict the median value of owner-occupied homes (MEDV). The regression model's performance is evaluated using Root Mean Squared Error (RMSE) and R-squared (R²) metrics for both training and testing sets. Additionally, we transform the regression problem into a classification task by binning MEDV into three categories: low, medium, and high. A logistic regression classifier is trained, and its performance is assessed using a confusion matrix and classification report. The results demonstrate that incorporating these new features significantly improves the accuracy and robustness of the house price predictions, highlighting the importance of feature engineering in predictive modeling Keyphrases: feature engineering, linear regression, logistic regression
|