loading page

Using Machine Learning and Administrative Data to Predict Premature Births
  • Yongjin Choi,
  • J. Ramon Gil-Garcia
Yongjin Choi
London School of Hygiene and Tropical Medicine Department of Infectious Disease Epidemiology

Corresponding Author:[email protected]

Author Profile
J. Ramon Gil-Garcia
University at Albany Department of Public Administration and Policy
Author Profile


Objective. To assess the potential of using machine learning and administrative birth data for predicting premature births. Design. The performance of ordinary least square (OLS) and deep neural network (DNN) classifiers for predicting low birth weight (LBW) and preterm birth (PTB) was compared using randomly selected two million birth records from the US CDC between 2016 and 2018. One million records from 2016 and 2017 were used to train the classifiers, while another million records from 2018 were utilized to test them. For hyperparameter tuning, a grid search with varying numbers of hidden layers, class weights on positive cases, and thresholds, was undertaken. Setting and Population: All births in the US Methods: ordinary least squares regression, deep neural networks Main Outcome Measures. LBW (<2,500g) and PTB(<37 weeks) Results. The classifiers generally showed high accuracy and specificity, however, the DNN classifiers showed much improvement in increasing sensitivity. Based on the results, the highest sensitivity with comparable specificity was 0.71 for LBW and 0.65 for PTB. Conclusion. These findings highlight that a ML approach could benefit PCHV programs by helping identify mothers with a high risk of premature birth. In particular, the DNN classifiers with administrative data can provide accessible solutions for public agencies and nonprofit organizations providing PCHV services that are not likely to possess massive clinical data or highly accurate genetic testing equipment.