TABLE 1: A meta-review of statements considering a) the utility of LF HRV as an index of sympathetic cardiac outflow and b) the utility of LF/HF HRV as an index of sympathovagal balance. Order is year/alphabetical.
In light of the above, it is notable that both hypotheses remain central to applied physiological research. And, thus, we reach an impasse. How should we understand this tension, where a theory which has been so explicitly undermined maintain its popularity, and retain evidentiary value? The following
This review explores why this might be the case – how does a theory whose physiological basis has been so comprehensively undermined remain in use within applied physiology? Here we address eight potential factors which allow this tension to be maintained.
Maintenance Factors
Technical divergence / Siloing
Consider the following from a hypothetical introduction:
The sympathetic and parasympathetic nervous systems (PNS) have discrete projections to the heart; the SNS has several projections to the whole myocardium, the PNS exclusively to the sino-atrial node (SAN). The SAN is a discrete area of tissue over the right atrium which acts as the ‘pacemaker’ of the heart. On a measured electrocardiogram, the initiation of the electrocardiographic cycle manifests as the P-wave, a small, positive, and regular wave corresponding to depolarisation/contraction of the atria. After propagation through the atrioventricular node (corresponding to the PR segment), the large typical ‘spike’ of the ECG is seen with depolarisation of the ventricles (the R-wave). The distance between R-waves, chosen because of their size and functional capacity (i.e., they are the direct precedent to the ventricular contraction which is the functional instantiation of the ‘heart beat’), is considered the best representation of the cardiac cycle. Thus, the admixture of excitatory SNS and inhibitory PNS influences on the SAN determine the cardiac cycle.
HRV is frequently used as a correlate of mental state or illness \citep{22355326}, and readers of papers within biological psychology or psychiatry will find the above a reasonably familiar paragraph-length outline of the relevant physiology, although perhaps more detailed than usual. It is reasonably well reflected in introductory textbooks on cardiac physiology and electrophysiology see \citep{Hall_2011}(Guyton and Hall, 2015; p.123). However, the above does not refer to a static body of knowledge, and almost the entire paragraph above can be heavily qualified or contradicted by the most recent available evidence:
- PNS and SNS influences on the heart are not discrete, they interact at both a pre- and post-junctional levels; likewise, PNS and SNS influences on the myocardium do not represent monolithic, unipolar forces which always act in opposition \cite{Coote_2013}.
- The PNS has projections not just the sinoatrial node (SAN) but to the whole myocardium \citep{12644879}
- The SAN is an area, not a discrete point \cite{19592411,21538926}
- The myocardium is not a ‘dumb pump’ run linearly by the co-influence of para/sympathetic influences, but rather ‘the little brain on the heart – a ganglionated plexus which itself integrates information \cite{Armour_2008,17455544} .
- The P-wave is a complicated signal with both (a) characteristic changes between participants and (b) a non-stereotypical pathway whose stereotypical electrocardiographic signal is more a consequence of filter settings than its underlying physiology \cite{Peper_1985,Potse_2016}
- R-waves are a representation of the cardiac cycle, which begins with the P-wave; changes in the PR-interval can be observed under many circumstances \citep{Carruthers_1987,Shouldiceb}
The point here is not that more recent conceptions of relevant cardiac and autonomic physiology invalidate measurement of HRV, but rather that literally all of the foundational assumptions listed above have been overwritten, and the affect of that on applied research is presently unknown. The above body of knowledge is no secret, rather much of it is well established, drawn from active research areas, and can be found in curricula beginning at around advanced undergraduate level. However, it has only the most occasional bearing on the work which falls under the rubric of applied physiology. In other words, even relatively well connected fields maintain a parochialism; HRV is a tool derived from an earlier body of knowledge, and retains its underlying assumptions in perpetuity even while its background is qualified or invalidated. Within psychophysiology, it is possible for applied researchers to be entirely unaware of research within cardiac or circulatory physiology which invalidates the tools they commonly deploy. That is, the research cited within Table 1 is considered only of peripheral relevance to most of those using HRV as a research tool, and is siloed or confined to a separate and largely unread body of literature.
Practical utility (i.e., Ease of access)
There are several ways in which various instantiations of SNS status can be assessed. Measuring directly from the cardiac nerves is a destructive technique not possible in humans, and has only been attempted in large-animal models \citep{11748052}. Within minimally invasive methods, the measurement of plasma noradrenaline kinetics and noradrenaline spillover \citep{3948363,Esler_1989} requires the use of injected radiotracers during a PET scan. Measuring muscle or skin sympathetic outflow directly requires specialised equipment, and the insertion of a microelectrode \citep{Macefield_1996}. Biomarker methods include measuring salivary alpha-amylase \citep*{19155141} and quantification of sweat pore reactivity \citep{Familoni_2016}; while these are non-invasive, neither method is widely established at present. Finally, the most widely used non-invasive method is the measurement of skin resistance/conductance \cite{Boucsein_1992}. This method, while well-established \citep*{herman1878uber}, is difficult to interpret, displays a surprisingly complex sub-structure which is not well understood \citep{Bach_2011,Bach_2012,Bach_2013} and has an unresolved relationship with the directly-measured firing of skin sympathetic nerves \cite{Henderson_2012,Bach_2014}. In opposition to all of the above, HRV measurement is inexpensive, lacks barriers to collection (i.e., measurement is possible during movement and activities of daily living, multiple daily measurements, and long-term recordings), and is well served by available software and hardware platforms. Finally, per the original demonstrations, the method of using LF/HF ratio to index sympathetic outflow claims to be accurate. A seminal demonstration by \citet{7923668} showed correlations between 0.68 and 0.78 of the various instantiations of Hypothesis B and orthostatic tilt angle; this is an intervention where a lying experimental participant is gradually tilted upright, and tilt angle is itself a well-constrained correlate of sympathetic outflow.
While objections to the meaning of this demonstration are well established \citep{9401419,9386196}, the accuracy of the measurements themselves are not in dispute, and have been replicated \citep{11686627}. In other words, if we take the underlying theory as read, then Hypotheses A/B are attractive research tools compared to other methods of measuring sympathetic outflow in any context. It is reasonable to expect that any experimental model will be prioritised if it is presented as such - cheap, convenient, accurate, and computationally straightforward.
Conceptual Utility
Researchers are frequently faced with ever-increasing analytical and theoretical complications as the number of available scientific publications grows. This multiply affects interdisciplinary fields. Within a discussion of HRV, we may deem relevant fully formed and separate programs of research on the mathematical properties of heart rate, expansion of the format and accuracy of measurement devices, an increasingly complicated picture of the autonomic and cardiovascular environment, a growing number of HRV calculations available to perform, a profound spreading of interest within personal and lay scientific publications, and finally, the various application of all of the above to not just questions of behaviour but variously within biomedical and exercise science, public health, and so on. Consequent to this breadth, assumptions which provide a simple experimental interface to biological systems are attractive in their own right, because they contain an inherent answer to the question of how measurement models should be performed.
In other words, the idea that HR can return a single and straightforward measure of ‘autonomic status’ is attractive, as it simplifies the measurement procedure, the analysis, and the conception of the nervous system necessary to perform experiments or observations. This simplicity is maintained by our hypotheses of interest. Throughout the use of these hypotheses, we occasionally see the continuing and tacit acceptance of an assumption subsequent to the original work on a "non-specific stress response" \citep{Selye_1956,SELYE_1936}, the idea that spreading arousal of the SNS can be seen in all systems and subsystems. That is, if a general arousal response can be successfully induced, then it is presumably non-specifically present because of sympathetic influences on the heart, skin, viscera, vasculature and so on. The legacy of this theory mirrors the measure provided by the idea of a uniaxial HRV which reports a measurement of a single sympathetically-mediated system. It should be noted, however, that this is contradicted by a broad array of evidence \citep*{19809584}; instead, we observe inter-coordinated responses between different sympathetically mediated sub-systems during provocation, and specific patterns of autonomic sequalae associated with different ‘arousing’ stimuli \citep{BERNTSON_1993,Kreibig_2007}.
Experimental Success
Hypotheses with any longevity are not derived in a vacuum, but from observations on top of layers of previous reasoning. The outflow of the peripheral nervous system is unambiguously central to the regulation of HR over time, and has unambiguous effects on low frequency (LF) and high frequency (HF) HRV power. Double blockade experiments \cite{Goldberger_2001} have long established the co-contributions of the PNS to HRV. The consequence of this is that if an experiment is known to affect the state of the nervous system, then HRV changes are often observed. That is, observations can be made, but interpretations differ – the ‘sympathetic’ HRV indices of Hypotheses A and B may accurately predict other independent variables, but not for reasons presently understood. Measurements may continue to provide predictive capacity in the absence of our ability to explain them. A related reason is that the specific value provided in Hypothesis B shows a significantly higher relative standard error than other measurements \cite{Heathers_2014}. In other words, it is generally observed to be volatile. If experiments are inadequately constrained, it is trivial to observe ‘significance’ within pairs of comparisons. This is the reason authors occasionally recommend against the use of spectral analyses at all, due to their potential lack of stability \citep{Ng_2009}.
Explicit Authority / Guidelines
The most definitive statement on how HRV research should be conducted is the paper by a task force of senior researchers in electrocardiology, physiology, neurology, and similar, drawn from The European Society of Cardiology and The North American Society of Pacing and Electrophysiology. It was published multiply in Annals of Noinvasive Electrocardiology, Circulation and European Heart Journal. It states:
"Disagreement exists in respect to the LF component. Some studies suggest that LF, when expressed in normalized units, is a quantitative marker for sympathetic modulations, other studies view LF as reflecting both sympathetic and vagal activity. Consequently, the LF/HF ratio is considered by some investigators to mirror sympatho/vagal balance or to reflect sympathetic modulations." EHJ version, p.366
\cite{1996}
This paper, a comprehensive statement of how HRV should be understood, measured and utilised, goes on to list the clinical utility of the LF/HF ratio, results of interest, and expected values the ratio might take. This provides an ongoing justification for Hypotheses A and B. Definitive statements concerning how research should be conducted, typically established by senior committees or working groups within formal scientific societies, are enormously influential. The Task Force paper (1996) has, between multiple outlets of publication, more than 11,000 citations and is easily the most highly cited and enduring reference work on HRV.
While it has been and remains commonly cited, it is also notable that it was also published prior to the objections recorded in this review (Table 1). The nature of scientific publications at present does not allow for dynamic updating of standards – the paper remains instead precisely as published, and continues to be cited.
Implicit Authority / Precedence
The continual citation of this method and the standards which recommend it become self-generative – it can continue to be acceptable because it has previously been acceptable. Consider the following scenario where a peer reviewer interacts with authors: a reviewer objecting to a technical point within a document (i.e. a detail in manuscript Y is not supported by evidence, due to theory A) may simply be told that the theory used was as per a previous published protocol, seen within manuscript Z. If this reviewer is correct, this should also invalidate the previous paper. But rather, the precedence of previous publications is accorded its own weight of evidence, due its previously being peer-reviewed, which is a partial statement on its quality; problems within the peer review process are by now well outlined \citep{12038911,Smith_2006,Henderson_2010,Balietti_2016}. Most researchers who perform peer review will have had a similar experience to the hypothetical scenario above.
Precedence in scientific literature is also often considered simply from the perspective of saving space and maintaining readability. At some point, commonly accepted assumptions behind a method, technique, or tradition become too detailed or too repetitive, so a shorthand emerges. Assumptions are then inferred by citation. The obvious problem is that precedence is fully capable of reproducing ideas of any quality. It may be no barrier if an idea has a poor or absent evidentiary basis – if an idea is published, especially in multiple locations, it has all that is presently required to become memetic. This process is particularly powerful if the idea in contention has explanatory value, if it forms a theoretical convenience, etc. as outlined in the subheadings above. Publication at present forms a permanent badge of acceptability, and precedence confers an authority that is resistant to being updated, questioned, or invalidated.
Dilution
A huge number of methods exist to calculate HRV, well over 100 (see, for instance, \cite{23674071,ramshur2010design}. These broadly include: (a) different categories of time- and frequency-domain measurements; (b) variations on identical methods, such as spectral analyses (Fast Fourier Transform, auto-regressive, Lomb-Scargle periodogram, pseudo-smoothed Wigner-Ville transform, etc.); (c) alterations in frequency bands within spectral analyses (i.e., from 0.12 to 0.5Hz as a substitute for 0.15 to 0.5Hz, or pNN20 as a substitute for pNN50); (d) ‘non-linear’ measurements (a catch-all term for sample entropy, fuzzy entropy, hidden words analysis etc.), and so on. It is thus common for research to report multiple indices simultaneously. These indices, often various related transforms of variance, are usually highly correlated with each other. The most obvious example illustrating similarity between HRV measurements is HF (high frequency power, as above) and RMSSD (root mean square successive difference); r=0.91 between individuals \cite{10449882}, r=0.883 – 0.977 within individuals \cite{9724299}. Dilution reduces the importance of any reported individual measure, which in turn increases the problematic nature of the hypotheses discussed here.
Say, for instance, we agree that the LF/HF ratio (i.e., Hypothesis B) has poor evidentiary capacity, as outlined in Table 1. But say this method is reported in parallel with nine other methods which are less controversial. The company of more easily interpretable results undoubtedly reduces the ‘problematic’ conclusion of Hypothesis B. In this situation, a question for a peer reviewer arises wherein there is no formal answer to: is the inclusion of a method that is not trustworthy or presently supported a diminished problem if it is accompanied by analyses which have better evidentiary support? Moreover, if multiple HRV metrics were assessed, but only LF/HF or a related metric showed a significant difference, would it render the result unpublishable? The evidence (e.g. \citealt{Heathers_2014}) indicates it would not.
Personal investment
It is entirely human to defend deeply held convictions, and science may fail to progress when these convictions are about scientific assertions which are not supported by evidence. The most extreme conception of the results of this is often referred to as ‘Planck’s Principle’ ("new scientific truth does not triumph by convincing its opponents … but rather because its opponents eventually die”, p33-34, \citealp{1950}). Planck’s statement here was not intended to be axiomatic, and was certainly not true of either Planck himself or of his close contemporaries (Helmholtz, Heisenberg) who displayed fundamental shifts in thinking during the working lives \cite{blackmore1978planck}. A modern conception, however, was found to be somewhat accurate by \citet{Azoulay_2008}, where the death of scientific ‘superstars’ lead to a noticeable change in the pattern of publication in favour of their non-collaborators. Within context, as recently as 2012, the authors and collaborators of the original papers have offered a limited defense of the hypotheses in question, consisting primarily of a restatement of the results of the original experiments, and renewed a call for further research \citep{Pagani_2012}. This defence is maintained without addressing the substantive objections referenced in Table 1.
CONCLUSIONS
The above outlines many of the specific reasons that the calculation of sympathetic indices from HRV may be both strongly refuted and widely used. In doing so, it approaches a broader thesis of how parallel consideration of a theory between fields can be established, and thus how a body of knowledge can simultaneously be regarded as acceptable yet outdated. The hypotheses addressed here were never scientifically problematic. When subsequently proposed after the original evidence gathered from sympathetic outflow inferred from orthostasis (e.g. \citealt{6599685,2874900}), these hypotheses were reasonable, offered in good faith, grounded in standard physiological observations, capable of being falsified, and generated specific, testable predictions. Since then, a broad base of further evidence has been amassed demonstrating that it is overwhelmingly likely that human cardiac, circulatory and autonomic physiology do not function as such that would allow the theory to work as described. A wide variety of authors have aggregated the evidence to say so, and specific refutations have not been avoided or overlooked in the public scientific record. The above reflects well on the scientific process, but the ensuing period of ‘life after death’ does not.
How to proceed from this point is unclear. If providing the information as stated in Table 1 was sufficient to change research practice, then the present discussion would be of historical interest only. Also unclear is what urgency should exist around this issue – should a meta-scientific framework exist around this issue to attempt to alter publication behaviour. What should be done? To retract the original papers is inappropriate. There is no suggestion of wrongdoing, and retraction itself is also unlikely to be effective, as it does not seem to prove a barrier to future citation. In one recent example, not even the retraction of papers by a researcher convicted of data fabrication appears to change citation habits dramatically \cite{Teixeira_da_Silva_2016,Bornemann_Cimenti_2015}. In light of the above, it feels entirely more appropriate that the onus to recognise the problem lies with present practice and not with the original theory.
There are some other possibilities. PubMed Commons and PubPeer both offer platforms for the scientific public to visibly annotated published articles. It may also be possible to automate responses on such platforms \cite{Nuijten_2015} – a strong statement could be made by automatically annotating all published articles which base conclusions on superceded methods. Journals (or scientific societies which maintain journals) might consider adopting analytical or editorial standards for analyses which are applied pre-publication, and compel researchers to include alternative methods. This sets a precedent, however – if an idea exists along a balance of evidence, at what point should a community of scientists compel others to use (or avoid) a certain theory? What represents a sufficient standard of proof in such a situation? Such a standard would invariably be high. Does the evidence presented here, as extensive as it is, meet such a standard?
This situation is also a consequence of how scientific information is structured. Scientific knowledge as unconnected, paywalled, discrete investigative, or informational units cannot display immediate cross-field connections between papers or citations. Crises in and around research bring with them their own specific urgencies. The failure of a drug already brought to market due to unacceptable death rates (Cerivastatin, Vioxx, Mibefradil, etc.) demands an immediate and coordinated research, regulatory, and legal response. However, the failure of a well-accepted social psychological effect (see recently, for instance, the treatment of various priming effects, e.g. \citealp{26501730}) is unlikely to kill anyone – or, indeed, to affect anyone outside of social psychology. This is not to say that continued interest in strongly refuted topics do not waste time and resources which could be spent, at minimum, in the production of knowledge within a plausible topic area.
The danger of the present status of HRV within psychophysiology is presumably somewhere between these examples. LF/HF is a common dependent variable compared against independent variables which are patient groups, but it is not widely regarded as having diagnostic capacity. A formal field-wide updating of standards of publication would include an extraordinary number of associated fields with separate conventions for publication standards and received wisdom. Irrespective of this, maintaining a central base of scientific knowledge as bedrock for further investigation is always important, regardless of the area of study. The alternative, if not actively dangerous, indicates researcher’s time and public money will be poorly deployed.
Having previously declared an obituary for the hypotheses outlined here \cite{Heathers_2012}, it is a curious task five years later to consider their ‘life after death’. But, as any consumer of popular media would be aware, it is very hard to kill a zombie.
[2] Source.
[3] Readers with an interest in the status of Hypothesis A will find it directly addressed in Goldstein et.al. (2011); a comprehensive overview of the internal consistency and physiology of Hypothesis B can be found respectively in Billman (2013) and Karemaker (2017).