Objective
The objective of this work is the implementation of a data processing and classification algorithm that allows the monitoring and prediction of hypertension based on the available features recorded in public electronic health records.
Hypothesis
· If it is feasible to generate a reliable algorithm that could predict hypertension by employing signal processing and machine learning techniques?
· Which predictor or features could be relevant to perform a prediction of hypertension?
Background and state of the art.
The prediction or monitoring of hypertension has been studied mainly in two different approaches in the current state of the art. The first approach is related to the prediction of systolic and diastolic blood pressure values based on the signal processing of photoplethysmography signals and electrocardiography signal. Both biopotentials are often used as a predictor for the estimation of blood pressure through a regression model but in some studies, the authors only used the PPG signal as a predictor. On the other hand, other studies have paid attention to the classification of the possible hypertense patient by using the user heath conditions as predictors that could be related to high blood pressure. Commonly used predictors are age, gender, body mass index, cholesterol level, weight, and height to name a few. Below is a review of the literature where both alternatives have been explored.
Classification approaches for the prediction of hypertension
Classic machine learning techniques have proven to be useful to classify hypertense subjects. For instance, \cite{L_pez_Mart_nez_2018} proposes the use of a logistic regression classifier to determine if a person could present hypertension or not. The study also tries to highlight which factors have statistical significance in classifying a person as hypertense or not hypertense. The model was generated taken data from The National Health and Nutrition Examination Survey (NHANES) from 2007 to 2016. The results showed a resulting sensitivity of 77%, a specificity of 68%, and a calculated Area Under the Curve (AUC) of 73%.
Predicting the occurrence of essential hypertension using annual health records.
This paper compares the performance of the naïve Bayes classifier, Support Vector Machine, Logistic Regression, Random Forest, and Multilayer Perceptron. The data was taken from the Korean National Health Insurance Corporation (NHIC), which contains Electronic medical records (EMR) of patients having their annual general health checkups between 2002 and 2013 (for 12 years). The model considers data related to the Population Density, Health Checkup results, demographic details, and Income-quantile details. Different types of classifiers were tested in this study like naïve Bayes classifier, logistic regression, random forest, and multilayer perceptron. Nonetheless, the model that achieved greater accuracy was the support vector machine which obtained an F1 score of 0.8002 and an accuracy of 0.8023.
Predicting increased blood pressure using machine learning.
doi:10.1155/2014/637635
This work is interested in developing a model based on a decision tree to demonstrate the availability of classifying pre hypertense or hypertense patient based on features such as body mass index (BMI), waist (WC) and hip circumference (HC), and waist-hip ratio (WHR), The data was collected and published by the same authors in the figshare repository, and according to the authors, the decision three models achieved a sensibility: 72%, specificity: 86.25%, and AUC: 0.688.