Healthcare Cost Patterns and Prediction: Investigating Personal Datasets Using Data Analytics
AbstractThe present study introduces a health insurance prediction system that leverages machine learning methodologies. In contemporary times, there has been a notable increase in endeavors focused on tackling this matter since the significance of health insurance as a research topic has markedly escalated following the pandemic. The dataset employed in this research comprises 1338 observations 7 columns and corresponds to individual medical expenditures in the United States, available at the Kaggle platform. The dataset encompasses a variety of variables utilized in the prediction of insurance prices, including age, gender, BMI, smoking status, and number of children. The researchers used machine learning models, including neural networks, XAI, and auto modeling, to determine the correlation between pricing and the attributes. The training process involved partitioning the dataset into an 80-20 ratio for training and evaluation. Consequently, the system achieved an accuracy rate of 97% by Gradient Boosting, but we corrected it to 92% by Gradient Boosting Regressor by encoding and hyper-tuning. Also, among predictive machine learning models, Random Forest had the best accuracy i.e., of 83.44%.