Methods

Ethics

The study was approved by the Danish National Committee on Health Research Ethics (VEK journal ID 74188). Data management and privacy policies was approved by the Danish Data Protection Agency (journal ID REG-131-2020). Informed consent was provided by all participants. An honorary fee of 500 Danish Crowns per EEG session was provided to all participants.

Participants

Patients (N = 50) of both sexes aged 18 to 59 with a primary diagnosis of either agoraphobia, depression, generalized anxiety disorder (GAD), obsessive-compulsive disorder (OCD), social anxiety disorder (SAD) or panic disorder (PD), with or without comorbidities including another emotional disorder, attention-deficit hyperactivity disorder and personality disorder about to start UP transdiagnostic group cognitive behavior therapy were recruited from three tertiary free-of-charge public Mental Health Service outpatient clinics in Denmark, as described in detail in (Reinholt et al., 2021). Patients were referred to the clinic after two previous failed treatment attempts in primary care.
Exclusion followed the exclusion criteria for receiving treatment in the participating outpatient clinics: an ICD-10 F20 diagnosis, bipolar disorder or autism, alcohol- or substance use disorder, increased risk of suicide, recent (<4 weeks) onset or alteration of psychotropic medication, previous traumatic brain injury or organic brain disorder as assessed by medical history, and normal mental capabilities as estimated by having completed Danish primary school.
Healthy comparison subjects (HC, N = 37), matched with the patient group in age and sex, were recruited from the local community through posters and online advertisement. Exclusion criteria were the same as for patients but also included no prior or present psychiatric diagnosis or psychotropic medication. All participants had normal or corrected-to-normal eye vision.

Clinical measures

Current medication status was extracted from the electronic health record including type of medication, dosage, treatment duration and changes hereof. Information on handedness and hearing status (normal/impaired) was interview-based. All participants were assessed with the the Mini-International Neuropsychiatric Interview version 7 (M.I.N.I.) diagnostic interview by a psychiatry trainee (MR) (Sheehan, 1998). For patients, a primary diagnosis within the emotional disorder spectrum was confirmed, and up to three concurrent secondary diagnoses were noted. Healthy comparison subjects were screened for the absence of symptoms fulfilling criteria for any psychiatric diagnosis.

Psychometrics

The battery of psychopathology measures consisted of several validated self-report questionnaires. While rating-scales derived directly from the HiTOP are under development and require local translation and validation, the items selected in this study were deemed to be sufficiently consistent with the HiTOP (Wendt et al., 2021).
The Multidimensional Emotional Disorder Inventory (MEDI, 49 items ranged 0 to 8) assesses nine empirically-supported transdiagnostic symptom dimensions within the Internalizing spectrum: autonomic arousal, avoidance, depression, intrusive cognitions, neurotic temperament, positive temperament, somatic anxiety, social anxiety and traumatic re-experiencing (Rosellini & Brown, 2019). Note that when calculating the MEDI total score, Positive temperament is subtracted rather than added.
The Modified Personality Inventory for DSM-5 and ICD-11 – Brief Form Plus (PID5BF+M, here referred to as PID36, 36 items ranged 0 to 3) assesses six personality trait domains in the Internalizing and Thought disorder spectra: anankastia, antagonism, detachment, disinhibition, negative affectivity and psychoticism (Bach et al., 2020). Note that this version is developed in accordance with the coming ICD-11, which is moving toward a dimensional understanding of the personality disorders (Bach & Mulder, 2022).
In addition, two shorter self-report questionnaires were administered to assess the severity of the transdiagnostic dimensions personality pathology and psychological distress, respectively: the Level of Personality Functioning Scale-Brief Form (LPFS, 12 items ranged 1 to 4) (Hutsebaut et al., 2016) and the K10 distress scale (K10, 10 items ranged 1 to 5) (Kessler et al., 2003).
For both HC and patients, this battery was administered in conjunction with the respective EEG recordings. Participants were instructed to answer +/- 1 week from the EEG recording. All psychopathology measures were obtained using the online survey and database management web application REDCap licensed to Region Zealand, Denmark (Harris et al., 2019).

Procedures

EEG laboratory setup

EEG was recorded at two psychiatric hospitals in Region Zealand, Denmark. Each session took place either in the morning or early afternoon. The first session, baseline, lasted approximately three hours including information, electrode and cap application, EEG recording (~1 hour), M.I.N.I diagnostic interview and breaks. Sessions two and three, for patients after 10 and 14 weeks, respectively, lasted at most two hours and consisted only of the EEG recording. In order to account for normal variation in the statistical models, some HCs were invited to a second recording after at least 8 weeks. Participants were instructed to show up rested and to avoid coffee and nicotine intake 2 hours before. Patients were also instructed to avoid, if possible, medication prescribed “as needed” on the night before and day of recording.
During EEG recording, participants were seated in a comfortable armchair in a secluded room and instructed to sit as still as possible. Visual stimuli were presented on a 17” LCD monitor situated 1.5 meters from the participant. Audio stimuli were presented with airtube stereo insert earphones (C and H Distributors Inc., Milwaukee, WI, USA, 2021). Similar room luminosity at the two sites was ensured with blackout curtains but was not objectively measured.

EEG recording

EEG at the two sites was recorded with identical Biosemi ActiveTwo Mark 2 systems with 64 Ag/AgCl pin-type active electrodes attached to a cap according to the extended 10/20 system (BioSemi, Amsterdam, 2021). The signal was recorded reference-free with common mode sense (CMS) and driven right leg (DRL) electrodes as “ground” placed centrally close to POz. The signal was digitized with a sampling rate of 2048 Hz. Electrode offset was kept below 40 µV.

EEG paradigms

All paradigms were presented using Presentation® software version 23.0 (Neurobehavioral Systems, 2021). The paradigms, presented in this order for all participants, were:
Attended oddball (AO)
Auditory stimuli (N = 1500 in 5 blocks) delivered monoaurally in a pseudorandom order: 10% target tones (1100 Hz, 50 ms duration), 6% distractor tones (a 50 ms bell sound) and 84% standard tones (1000 Hz, 50 ms). 50 dB sound intensity and 10 ms rise/fall for all stimuli. Participants were instructed to fixate on a white cross on a black background on the monitor and to press the left mouse button with their index finger when hearing the target tone while ignoring distractor stimuli. Participants started with a 30 stimuli test round.
Flanker
The flanker task was a modified version of the Eriksen Flanker (Eriksen & Eriksen, 1974) commonly used in resarch, e.g., (Riesel et al., 2022; Seow et al., 2020). Five horizontal arrows were presented in white on a black background on the monitor. Trials (N = 480 in 10 blocks) could be either congruent (<<<<< or >>>>>) or incongruent (<<><< or >><>>) and were presented for 200 ms. Trials were 50% congruent and 50% incongruent presented in random order. Participants were instructed to respond as quickly and accurately as possible by pressing either the left or right mouse button indicating the direction of the central arrow. Participants had 1050 ms to respond. Feedback was delivered on the monitor at the end of each block: if >90% correct responses or <25% missed trials (“Try to respond faster!”) and if accuracy <75% (“Try to respond more accurately”. Otherwise feedback was “Good job!”. If participants had less than 17 errors in total, up to two extra blocks were administered in order to ensure internal consistency of the ERN (Clayson, 2020). Participants at first completed a test round consisting of 12 trials to ensure instructions were understood.
Unattended oddball (UO)
Participants watched a muted nature documentary while auditory stimuli (N = 1800 in 6 blocks) were delivered monoaurally at 50 dB in a pseudorandom order: 6% frequency deviant tones (1100 Hz, 50 ms duration), 6% duration deviant tones (1000 Hz, 100 ms duration), 6% combined frequency and duration deviant tones (1100 Hz, 100 ms duration) and 82% standard tones (1000 Hz, 50 ms). 50 dB sound intensity and 10 ms rise/fall for all stimuli. Participants were instructed to ignore all auditory stimuli and focus on the monitor.

EEG preprocessing

EEG data were processed offline in EEGLAB 2023.1 on MATLAB R2021b (Delorme & Makeig, 2004; Mathworks, 2022). Cleaning of artifacts and noise was with the The Reduction of Electroencephalographic Artifacts (RELAX) preprocessing pipeline, a novel pipeline based on an empirical assessment of established cleaning methods (Bailey et al., 2022, 2023). We applied the default RELAX pipeline, RELAX_MWF_wICA, which utilizes methods from the following published toolboxes: fieldtrip, the MWF toolbox, wICA (Castellanos & Makarov, 2006), ICLabel (Pion-Tonachini et al., 2019), PREP (Bigdely-Shamlo et al., 2015) and Zapline-plus (Klug & Kloosterman, 2021). Given that single-trial analysis handles noisy data better and in order to remove as little brain activity and obtain as many trials as possible, especially in the Flanker paradigm, we applied RELAX with less-stringent settings than default for our main analysis (specified below).
Prior to processing in RELAX, the raw Biosemi EEG data were imported into EEGLAB reference-free and down-sampled to 250 Hz. In initial preprocessing steps, RELAX removed line-noise at 50 Hz with the Zapline-plus toolbox and referenced data to common average with the PREP toolbox after the automatic removal of extremely noisy or flat channels. Data were hi-pass filtered at 0.25 Hz and low-pass filtered at 80 Hz using the default RELAX Butterworth filter, which is suggested to perform better than EEGLAB’s pop_eegfiltnew (Bailey et al., 2023). Note that RELAX applies a 0.25 Hz hi-pass filter by default instead of the commonly used 1 Hz, a trade-off which somewhat decreases the quality of the subsequent independent component analysis (ICA) decomposition but does not distort the ERP time course (Bailey et al., 2023; Luck, 2014; Tanner et al., 2016; Winkler et al., 2015).
Next, artifact reduction based on multiple wiener filtering (MWF) with a delay period of 30 and wavelet-enhanced ICA (wICA) with the extended infomax ICA algorithm proceeded with less strict than default RELAX cleaning parameters. Specifically, muscle slope threshold was -0.31 (default -0.59), no channels were deleted due to muscle artifacts (default: channels with 5% or more muscle artifacts deleted) and only channels with 15% or more extreme artifacts were deleted (default 5%). Other settings remained on default, including at most 20% removed channels. Across all sessions, on average 59.7 channels remained for the AO, 59.8 for the Flanker and 60.2 for the UO paradigm. There was no difference between groups in number of removed channels at baseline.
After interpolating removed channels, the preprocessed data were epoched and baseline-corrected according to parameters predetermined for each of the three paradigms (see Table 1). Note that RELAX applies a regression-based baseline correction method instead of the traditional subtraction which has been shown to distort the ERP waveform (Alday, 2019). For response-locked ERPs in the Flanker paradigm, baseline regression correction was with one factor with two levels: correct and error response. For stimulus-locked ERPs from the Flanker paradigm and ERPs from the other two paradigms, regression was with zero factors. Next, epochs with an absolute voltage amplitude threshold exceeding 100 µV (default 60 µV) or a kurtosis/improbable data limit exceeding 3 standard deviations (SD)/median absolute deviation (MAD) overall or 5 SD/MAD at any channel were rejected.
Across all sessions, on average 1070 epochs of all trial types remained for the AO, 417 for the Flanker paradigm and 972 for the UO paradigm. At baseline, in the AO paradigm, there was a significant difference between groups in number of remaining epochs (HC: 1093, Patients: 1056; t(85) = 3.32 , p = 0.001). This difference was driven by nearly non-significant differences in the number of remaining Standard stimuli epochs (t(85) = 1.90, p = 0.061) and Target stimuli epochs (t(85) = 1.77, p = 0.08). This small difference was deemed to be of no consequence for the main analysis. There were no further differences between groups in number of remaining epochs for any of the other paradigms. Finally, the preprocessed data were converted to BIDS format to facilitate the sharing of data with the community (C. R. Pernet et al., 2019).
Table 1 shows an overview of paradigm and ERP variables.
>> Table 1 here <<

ERP analysis and statistics

All demographic and behavioral statistics were conducted in R (R Core Team, 2023). ERP Statistical models were designed, evaluated and visualized using LIMO EEG in EEGLAB and MATLAB functions (Delorme & Makeig, 2004; C. R. Pernet et al., 2011).
After preprocessing, ERP single-trial data were processed in LIMO EEG. For a given subject, session and channel, this first-level of the GLM has the general form where denotes the single-trial ERP data in the form , is a design matrix coding for the paradigm-specific stimulus types, are the first-level beta coefficients to be estimated and is the residual term representing what is left when the effects of the beta coefficients are accounted for.
The term , in LIMO EEG referred to as the adjusted mean, warrants special attention, as effects on the other beta parameters are modulations around this constant term. For example, the response-locked Flanker model is , where is the beta coefficient corresponding to correct responses and corresponds to error responses. Accordingly, . Given the near-identical triphasic waveform of the CRN and the ERN, if is more negative-going than , necessarily lies in-between. Therefore, will be positive-going even though the CRN is a negative-going wave. As such, if a result indicates that is modulated negatively by a psychopathology measure, the interpretation is that a greater, or more negative, CRN correlates with higher scores.
First-level model parameter estimation was with weighted least squares (WLS), a robust extension to ordinary least-squares (OLS) which uses principal component projection to weigh down outlier trials (C. Pernet et al., 2022). In all ten ERP beta models were evaluated, each containing one or more classic ERP components (see Table 1 for stimulus types and associated ERP models).
At the second level, for each of these ERP beta models, we applied mass univariate robust linear regression as implemented in LIMO EEG. Age, gender, group, medication status, session number and psychopathology measure corresponding to each session were explanatory variables. The general form of the model was:
where are the second-level beta coefficients to be estimated, is the explanatory variables data matrix and is the first-level ERP beta model defined above.
In this linear regression model, gender was coded as female = 1, male = -1. Group was coded as HC = 1, Patient = -1. Due to the large variety in dosage and type, medication status was coded with two dummy variables denoting no prescription (-1, -1), one prescription (1, -1) and more than one prescription (1, 1). Medication prescribed “as needed” was not considered since patients were instructed to avoid intake from the afternoon before the day of recording. Session was coded with three columns indicating with 1 or 0 whether the particular entry of the data matrix belonged to baseline, week 10 or week 14. Psychopathology measure was likewise coded in three columns indicating scores for the associated session. Accordingly, had nine columns and as many rows as there were data sets (N = 172). For example, a given row corresponding to data set 125 had the form denoting the subject’s age = 25, gender = -1 (male), group = -1 (Patient), medication = [1 1] (more than one prescription), Session = [… 0 1 0 …] (week 10) and psychopathology measure = [… 0 39 0] (week 10).
Maximum likelihood estimates of were computed at each time frame, channel by channel, using iterative re-weighted least squares (IRLS). IRLS is a robust extension to OLS adding weights to outlier subjects, and has been shown to increase sensitivity in the analysis of neuroimaging data (Wager et al., 2005). had the form representing the effects of each of the explanatory variables (plus a constant term) on the ERP model at each data point.
Next, a linear combination of these second-level beta coefficients was used to test for significant effects of the psychopathology measures (Kiebel & Friston, 2004). Specifically, we defined a reduced model by applying the contrast and tested, channel by channel, at each time frame the null hypothesis where is the transpose of the contrast vector. In other words, we tested the null hypothesis of no effect on of the psychopathology measures while accounting for the other explanatory variables. Note that this contrast model did not assess whether psychotherapy treatment changes ERP features or modulates the association with psychopathology measures, or whether associations are present only at a given session, e.g. at baseline. On the upside, the model allowed us to state that detected associations were present across groups and sessions irrespective of effects of psychotherapy.
The associated one-sided t-test was:
where is the variance of the full model and the weights estimated with IRLS applied to .
The result of these many one-sided t-tests was an uncorrected statistical parametric map (SPM) of size , e.g., t -values for ERP models in the AO paradigm. Correction for multiple comparisons (MC) was conducted using threshold-free cluster enhancement (TFCE) as implemented in LIMO EEG using 1000 bootstrap iterations (Mensen & Khatami, 2013; C. R. Pernet, 2015; Smith & Nichols, 2009). TFCE builds on traditional bootstrap or permutation-based cluster MC correction methods commonly used in neuroimaging research, e.g. spatiotemporal clustering (Maris & Oostenveld, 2007; Sassenhagen & Draschkow, 2019). However, instead of pre-specifying a cluster-forming threshold and assigning to a cluster all connected data points whose corresponding t -value is above this threshold, the method considers clusters formed at all possible thresholds. The more clusters a given data points belongs to within the range of thresholds, the higher is the assigned TFCE score. As a result, whereas in traditional spatiotemporal clustering methods, the threshold would influence what type of clusters are detected, in TFCE narrow clusters with high t-values are equalized with broad clusters with lower t-values (Smith & Nichols, 2009). At each data point, , the TFCE score is given by:
where and are the minimum and maximum t-values in the data, respectively, is the cluster extent, is the cluster height and and are scaling constants, which in LIMO EEG are fixed to 0.5 and 2, respectively (C. R. Pernet, 2015). To arrive at the final corrected SPM of significant t -values, the method proceeds with estimating the empirical TFCE distribution through bootstrapping. Importantly, sampling is with replacement of all datasets belonging to a subject. Then, the maximum TFCE values from each bootstrap are sorted and the value at is the estimated TFCE threshold, where is a pre-determined significance threshold and are the number of bootstrap iterations. Data points whose TFCE score exceeds this threshold are deemed significant at the level and the corresponding t -values are included in the SPM. Note that a trade-off for the increased cluster-detection capabilities of TFCE is that one cannot state which of the included data points make a cluster significant (Smith & Nichols, 2009).
For our main analysis, we tested the associations between the 10 ERP beta models defined in Table 1 and the four transdiagnostic psychopathology measures (K10, LPFS, MEDI and PID36) at an level of or 0.1% In case of significant results for MEDI and PID36 total scores, we also show results for the respective sub scales. These results are presented in full in Supplementary Materials and commented upon in Discussion.
Results are presented as heat maps indicating in red or blue with varying intensity positive or negative t -values, respectively. Clusters of these t -values denote spatiotemporal regions where the effects of psychopathology measures on the ERP model were significant. As such, interpretation of results is in terms of direction of effects at relevant regions of interest. To this end we also display shaded regions indicating traditional ERP analysis time windows.
ERP grand averages are displayed for each stimulus type and group (HC and Patient) as the the 20% trimmed mean of subject-level weighted single-trial ERP data. Trimmed mean represents a robust central tendency estimate of the mean of the raw single-trial data and corresponds to a traditional grand average ERP waveform (C. R. Pernet et al., 2011; Wilcox & Rousselet, 2018). Instead of traditional frequentist confidence intervals (CI), which only gives the long-term probability of the true mean, LIMO EEG by default displays the 95% Bayesian Highest Density Interval (HDI), which is the 95% probability of the observed 20% trimmed mean (Morey et al., 2016; C. R. Pernet et al., 2011).
Finally, we also show results from statistical analyses of demographic and psychopathology measures. To test the internal consistency of MEDI and PID36 we estimated McDonald’s Omega using functions from the R package semTools applied to the baseline dataset (N = 87) (Jorgensen et al., 2022). McDonald’s Omega has been suggested to be a more reliable estimate than the commonly used Cronbach’s Alpha (Flora, 2020). To test for group differences in demographics and psychopathology measures we applied Welch’s two-sample t-tests for continuous variables and Fisher’s exact test for the categorical variable Gender. To test for change of psychopathology measures across sessions, from baseline to week 10 and 14, respectively, we applied mixed linear regression models using the R package lme4 , e.g., (Bates et al., 2015). Confidence intervals and p-values were computed with a Wald t-distribution approximation (Luke, 2017).