Methods and Materials
Population
All data were collected at the Centre for Human Drug Research in Leiden, the Netherlands, a clinical research organization specialized in early phase drug development studies. Data collected during the mandatory medical screening to verify study eligibility for enrolment in the early phase drug development studies as a volunteer between 2010 and 2019 were included in the present analysis. Ethical approvals from the Medical Ethical Review Committee for the included studies were acquired and informed consent documents were signed by the volunteers prior to any data collection. The present study was performed in accordance to local regulations. All activities were performed in accordance with applicable standard operating procedures.
The medical screening consisted of a single visit to the clinical unit where a detailed history, a physical examination, vital signs including blood pressure, temperature, weight and height measurement, body mass index (BMI) calculation, and a 12-lead ECG were recorded. Additionally, haematology and chemistry blood panels, urine dipstick, and a urine drug test were analysed.
Data collection for the model
ECG parameters of 6228 subjects with an age between 18 and 75 years were included in the present study. From each subject ECG, 574 features were extracted by the MUSE system. Additionally, gender was used as a feature. The age of the subjects was rounded in whole years. At least ten EGGs were available for each age.
Data pre-processing and selection
As validation set two subjects of each age were kept apart as final test set. The rest of the data was used as the training set.
To create a balanced training set the Synthetic Minority Oversampling Technique (SMOTE) algorithm was applied on the training set to create ‘synthetic’ subjects for the less populated age groups based on the values in the concerning age groups. [18]
Machine learning
A neural network was used as a machine learning model. The keras module v. 2.4.3 in python 3.8.5 was used to build a model. Before training, internal cross validation (three-fold) within the training set was used to optimize the model. The network was optimized for number of layers, number of nodes per layer, activation function per layer for each layer and learning rate. A batch size of 300 was used. The number of epochs (defined as the number of cycles through the full training dataset) for internal validation was determined based on validation performance in the internal validation set. The number of epochs for final validation was based on the median of the optimal number of epochs for the internal cross validations. This process of optimization, training, and validation was repeated 10 times with different training and test sets. The optimal models were evaluated on the test set with the R2 score and mean absolute error. We also evaluated the model performance with respect to gender.
To investigate how such a model could perform in groups of subjects or patients, the mean absolute error in a group of 1 to 50 randomly selected subjects in the test sets was evaluated. This was repeated 10 times for each fold (i.e. 100 times for each number of subjects).
To gain insight into the impact of the individual features on the predicted age, each fold SHapley Additive exPlanations (SHAP) values were calculated [19] based on the training set. The importance of the features was validated by means of permutation importance (defined as the decrease in a model score when a single feature value is randomly shuffled).[20]
Results
Table 1 shows the clinical characteristics of the 6228 included subjects. The study population was divided into ten chronological age groups of 6 years, starting from the age of 18 years. Each age group contained at least 194 subjects, and younger age groups comprised up to 2282 subjects. A total of 1808 (29 %) volunteers were female.
In Figure 1 ECG examples of a young 18 year old male (1A) and an elderly 74 year old male (1B) are shown. Figure 1C shows an ECG of a young 19 year old female and figure 1D shows an ECG of an elderly 74 year old female subject. Several differences between the young and older healthy subjects were discernable. In elderly persons the heart rate was lower, the T wave had a lower (absolute) amplitude in leads I,II,III,AVR, and AVL and the P-wave duration seemed shorter. However, these ECG differences showed considerable variations in the healthy population.
In supplementary Tables 1 and 2, 54 features present in most leads and other ECG features used for the machine learning model are shown, respectively. In addition, gender of each subject was also included in the model.
The relation between the (predicted) physiologic age and the chronological age was assessed in 10 sets of 116 subjects. In Figure 2a, the relation between predicted physiologic age and chronological age of all 10 test sets is shown. The average relationship of the models showed an R2 of 0.72 ± 0.04 (mean ± SD). The mean absolute error of all predictions was 6.9 ± 5.5 years.
On average, the predicted physiologic age was 0.3 years younger than the chronological age of the subjects. The median deviation of all predicted ages was 5.6 years from the actual age, indicating that half of the predictions was within the range of 5.6 years of chronical age.
The average prediction line is presented in figure 2b. The average predicted age of the 20 subjects per chronological age had a mean absolute error of 3.4 ± 3.0 years (R2= 0.93). For subjects between 30 and 60 years old the mean absolute error of the average predicted age per chronological age was 1.6 ± 1.1 years.
Figure 3 shows how such models could perform in new patient groups. It can be seen that the average absolute prediction error is declining fast when multiple subjects are tested. For example, a cohort of 10 healthy subjects with age ranging from 18 to 75 years would have an average absolute error of 2.7 ± 2.1 years. The mean absolute error of a test group of 30 subjects would be only 1.7 ± 1.2 years.
In order to study gender differences, the predicted physiological ages of the male and female subjects in the test sets were separated and are presented in Figure 4. The predicted ages of the male subjects were more accurate (R2= 0.74) than the predictions of the female subjects (R2= 0.66).
Figure 5 shows the SHAP values of the 40 most important ECG features used in the prediction model. So, the impact of each individual feature on the model output and physiologic aging can be seen. Some of the most important features on the prediction of physiologic age were T top abnormalities in leads V4 and V5, P top amplitude in leads AVR and II and atrial rate.
An increase of P peak amplitude in lead II for example, indicates a younger physiological age (a long red bar to the left). A longer PR interval both indicate an older physiologic age (longer red bar to the right). A higher atrial rate indicates a younger physiologic age ( large red bar to the left). The impact of gender was only of minor importance with SHAP values ranging from -1.2 to 0.9. The order of the feature permutation importance is similar to the order of the SHAP values, confirming the impact of the features.
Discussion
In this study we developed machine learning models that allow accurate prediction of physiologic cardiac age of healthy subjects based on 12-lead surface ECG parameters. Using a neural network we were able to estimate the age of a healthy subject with an error of 7 years and to analyze the impact of the ECG features. The created models of the present study may serve as a benchmark for testing the effects of new pharmacological drugs on potential decline or improvement of physiologic health of the heart.

Application of Machine Learning

Attia et al. recently sought to determine whether the application of machine learning algorithms, including convolutional neural networks, to a large ECG patient data set would be capable of predicting age and sex reported by patients, independent of additional clinical data [17]. They further investigated whether discrepancies between ECG age and chronological age might be a marker of physiological health. When the convolutional neural network-predicted age exceeded a patient’s actual age by at least 7 years, there was a higher incidence of cardiovascular comorbidities, potentially suggesting that the convolutional neural network-predicted age from 12-lead ECGs may correlate with physiological health. Their findings suggested that physiological age is distinct from chronological age, and may have useful clinical applications. For example, if a patient’s biologic age is 60 but their ECG age predicts that they are 70, it may indicate underlying cardiovascular disease and potential risk. A limitation of their study was, as also recognized by the authors, that all individuals included were patients, and thus an ECG was obtained for a certain clinical indication. It was questioned by the authors whether their results are similarly accurate among an ostensibly healthy population is unknown, and revalidation in such a cohort will therefore be critical.
The same holds true for the study by Hirota et al., who studied biological age, physiological age, and all-cause mortality by 12-lead ECG in patients without structural heart disease. [21] Their data showed that the gap between ECG-predicted physiological and biological age allowed estimation of increased risk of all-cause mortality. Although their study subjects were assumed to have no structural heart diseases, it was stated by the authors that it will be necessary to validate the results of their study in populations of healthy subjects. In our study, we only studied healthy individuals, giving the advantage of being a much needed benchmark study, which enables the validation of future studies in patients versus our data.

Performance of the model

The relation between chronological and predicted physiologic age was associated with an R2 of 0.72. Although with a smaller dataset than used by Attia et al., our predictions have a similar performance, probably because of the healthy population in our study, which we expect reduces the variability of the association. Given the large number of influencing factors that can affect ECG parameters the R2 of 0.72 of our models seems sufficient to detect a pharmacodynamic effect in a cohort of subjects. Use of the entire dataset with a larger number of subjects may improve future performance of the model.
In the present study, the impact of physiologic aging on the various ECG features was analyzed using SHAP values. Several changes are clearly visible in the ECG figures. Some of these are already well known in clinical practice, such as prolongation of PR and QT interval and deceleration of heart rate.[12] Other changes, however, could only be recognized by using machine learning, while these may be evenly important Moreover, when multiple features change at the same time, it becomes difficult to judge whether the change in the ECG is good or bad without using machine learning. By means of machine learning techniques a combination of various ECG changes allows a more accurate insight into the physiologic health changes of the heart.

Gender differences

The accuracy of predicting physiologic age was found to be higher in males than in the female subjects. This may be due to the somewhat smaller female study population, but it may also reflect the atypical ECG repolarization patterns which are known to occur frequently in women.[22] The SHAP values show that impact of gender on physiologic age prediction was only of minor importance.

Pharmaceutical drug testing and potential implications

The prediction of the physiologic age for one single person is less relevant in this model. However for larger groups or cohorts of multiple subjects, the prediction is more accurate. For example, for a group of 30 test subjects, the average deviation is only less than two years from average physiologic age. Therefore, our models could be particularly suitable as benchmark for testing new pharmaceutical drugs or other interventions which may have impact on cardiac health in the near future. Differences between physiologic ECG age and chronological age have been shown to predict all-cause and cardiovascular mortality and reflect physiologic age, cardiovascular health and long term outcomes. [23]
The proper use of a model - trained on the entire dataset - in early drug development can provide important information that can be used to make a go/no-go decision regarding further development of new drugs. Similarly, this can be used to guide the decision-making process regarding the dosage range to be used in phase II studies, determining a therapeutic window, and even identifying the target study population [24]. This way novel pharmacological drugs could be tested for effect on cardiac physiologic aging in the early phase of development.
Limitations
Our population consisted of only 29% female subjects. This may have influenced the accuracy of the model, but SHAP value analysis showed that gender only had a minimal impact on the predictions of physiologic age.
ECG changes do not need to have a purely cardiac cause, but they may also be caused by effects of age on the position of the heart in the thorax, the presence of fat layers around the heart, and the shape of the thorax shape. Therefore, the found relationship does not necessarily mean older heart per se, but can also mean an older body.
Conclusion
The application of machine learning to the ECG using a neural network regression model, allows estimation of physiologic cardiac age. This technique could be used to pick up subtle age-related cardiac changes, but also estimate the reversing of these age-associated effects by administered treatments.
References
1. van Dam, P.M., et al., The relation of 12 lead ECG to the cardiac anatomy: The normal CineECG. Journal of Electrocardiology, 2021.
2. Biernacka, A. and N.G. Frangogiannis, Aging and cardiac fibrosis. Aging and disease, 2011. 2 (2): p. 158.
3. Hayashi, H., et al., Aging‐related increase to inducible atrial fibrillation in the rat model. Journal of cardiovascular electrophysiology, 2002. 13 (8): p. 801-808.
4. Wang, F., T. Syeda-Mahmood, and D. Beymer. Information extraction from multimodal ECG documents . in 2009 10th International Conference on Document Analysis and Recognition . 2009. IEEE.
5. Roetker, N.S., et al., Prospective study of epigenetic age acceleration and incidence of cardiovascular disease outcomes in the ARIC study (Atherosclerosis Risk in Communities). Circulation: Genomic and Precision Medicine, 2018. 11 (3): p. e001937.
6. Kistler, P.M., et al., Electrophysiologic and electroanatomic changes in the human atrium associated with age. Journal of the American College of Cardiology, 2004. 44 (1): p. 109-116.
7. Breitling, L.P., et al., Frailty is associated with the epigenetic clock but not with telomere length in a German cohort.Clinical epigenetics, 2016. 8 (1): p. 21.
8. Perna, L., et al., Epigenetic age acceleration predicts cancer, cardiovascular, and all-cause mortality in a German case cohort.Clinical epigenetics, 2016. 8 (1): p. 64.
9. Wang, Z., et al., Predicting age by mining electronic medical records with deep learning characterizes differences between chronological and physiological age. Journal of biomedical informatics, 2017. 76 : p. 59-68.
10. Horvath, S., et al., Obesity accelerates epigenetic aging of human liver. Proceedings of the National Academy of Sciences, 2014.111 (43): p. 15538-15543.
11. Levine, M.E., et al., Menopause accelerates biological aging.Proceedings of the National Academy of Sciences, 2016. 113 (33): p. 9327-9332.
12. Rijnbeek, P.R., et al., Normal values of the electrocardiogram for ages 16–90 years. Journal of electrocardiology, 2014.47 (6): p. 914-921.
13. Macfarlane, P., et al., Effects of age, sex, and race on ECG interval measurements. Journal of electrocardiology, 1994. 27 : p. 14-19.
14. Mason, J.W., E.W. Hancock, and L.S. Gettes, Recommendations for the standardization and interpretation of the electrocardiogram: part II: Electrocardiography diagnostic statement list: a scientific statement from the American Heart Association Electrocardiography and Arrhythmias Committee, Council on Clinical Cardiology; the American College of Cardiology Foundation; and the Heart Rhythm Society: endorsed by the International Society for Computerized Electrocardiology.Circulation, 2007. 115 (10): p. 1325-1332.
15. Kligfield, P., et al., Recommendations for the standardization and interpretation of the electrocardiogram: part I: the electrocardiogram and its technology a scientific statement from the American Heart Association Electrocardiography and Arrhythmias Committee, Council on Clinical Cardiology; the American College of Cardiology Foundation; and the Heart Rhythm Society endorsed by the International Society for Computerized Electrocardiology. Journal of the American College of Cardiology, 2007. 49 (10): p. 1109-1127.
16. Khane, R.S., A.D. Surdi, and R.S. Bhatkar, Changes in ECG pattern with advancing age. 2011.
17. Attia, Z.I., et al., Age and sex estimation using artificial intelligence from standard 12-lead ECGs. Circulation: Arrhythmia and Electrophysiology, 2019. 12 (9): p. e007284.
18. Chawla, N.V., et al., SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 2002.16 : p. 321-357.
19. Lundberg, S.M. and S.-I. Lee. A unified approach to interpreting model predictions . in Advances in neural information processing systems . 2017.
20. Altmann, A., et al., Permutation importance: a corrected feature importance measure. Bioinformatics, 2010. 26 (10): p. 1340-1347.
21. Hirota, N., et al., Prediction of biological age and all-cause mortality by 12-lead electrocardiogram in patients without structural heart disease. BMC Geriatrics, 2020. 21 (460).
22. Okin, P.M., Electrocardiography in women: taking the initiative . 2006, Am Heart Assoc.
23. Ladejobi, A., et al., ECG-DERIVED AGE AND SURVIVAL: VALIDATING THE CONCEPT OF PHYSIOLOGIC AGE DETECTED BY ECG USING ARTIFICIAL INTELLIGENCE. Journal of the American College of Cardiology, 2020.75 (11 Supplement 1): p. 3469.
24. Groeneveld, G.J., Hay, J. L., Van Gerven, J. M., Measuring blood–brain barrier penetration using the NeuroCart, a CNS test battery. Drug Discovery Today: Technologies, 2016. 20 : p. 27-34.