loading page

Healthcare Cost Patterns and Prediction: Investigating Personal Datasets Using Data Analytics
  • +3
  • Md Aminul Islam,
  • Pretam Chandra,
  • Bhupesh Kumar Mishra,
  • S M Firoz,
  • Ahmed Fahim,
  • Mozammel Hoque
Md Aminul Islam
School of Engineering, Computing, and Mathematics, Oxford Brookes University
Pretam Chandra
Department of Computer Science & Engineering, Adamas University
Bhupesh Kumar Mishra
Centre of Excellence for Data Science, AI and Modelling, University of Hull
S M Firoz
Ahmed Fahim
Department of Computer Science & Engineering, American International University
Mozammel Hoque
Cyber Security Unit, Gannon University


The present study introduces a health insurance prediction system that leverages machine learning methodologies. In contemporary times, there has been a notable increase in endeavors focused on tackling this matter since the significance of health insurance as a research topic has markedly escalated following the pandemic. The dataset employed in this research comprises 1338 observations 7 columns and corresponds to individual medical expenditures in the United States, available at the Kaggle platform. The dataset encompasses a variety of variables utilized in the prediction of insurance prices, including age, gender, BMI, smoking status, and number of children. The researchers used machine learning models, including neural networks, XAI, and auto modeling, to determine the correlation between pricing and the attributes. The training process involved partitioning the dataset into an 80-20 ratio for training and evaluation. Consequently, the system achieved an accuracy rate of 97% by Gradient Boosting, but we corrected it to 92% by Gradient Boosting Regressor by encoding and hyper-tuning. Also, among predictive machine learning models, Random Forest had the best accuracy i.e., of 83.44%.