Linden and Yarnold1 recently proposed classification tree analysis (CTA), a machine‐learning procedure, as an alternative to conventional methods for analyzing mediation effects in treatment-outcome research. They note that CTA may have a number of advantages in this regard. It requires no assumptions about the distribution of variables or the functional form of the best-fitting model, for example, thus affording greater potential flexibility in identifying complex forms of association among variables with varying scales of measurement (e.g., binary and continuous). The authors further argue that CTA, unlike conventional approaches to testing mediation, “will not generate a model if a treatment-mediator-outcome relationship does not exist” (p. 359) and, conversely, that CTA “will systematically identify a treatment-by-mediator interaction if it exists, as well as any other interaction between variables.” (p. 359). Using data from the Job Search Intervention Study (JOBS II), they find that structural equation modeling, a conventional approach to testing mediation, failed to indicate support for job-search self-efficacy as a mediator of the effects of the intervention on whether the study participant was reemployed at follow-up. In contrast, the authors conclude that CTA applied to the same set of variables revealed that job-search self-efficacy was a mediator of intervention effects on employment among those in the treatment group.There are, however, two problematic aspects of the authors’ approach. First, as they point out, causal inference of a mediational pathway depends on the assumption that the associations involved exist net of potential confounding variables. In a randomized control trial such as JOBS II, confounding of the association between the mediator and outcome is of particular concern.2 The authors seek to address this concern by conducting a CTA that includes a set of 9 potential confounders (e.g., income, age, initial level of depressive symptoms) as candidate predictors. This approach does not ensure statistical control for these variables, however, for at least two reasons. First, there is no guarantee that the variables involved will actually be included in the resulting classification tree; failing to meet the criterion for inclusion does not rule out the possibility that a given potential confounder or combination of such confounders nonetheless share an association with the mediator and outcome to an extent that renders their association non-significant. This concern proves pertinent to the authors’ analysis as only 3 of the potential confounders earn entry into the resulting classification tree for reemployment status. A second concern with the approach taken by the authors is that whatever covariates are included in the classification tree may be included in positions that are inadequate for the purpose of controlling for confounding of the mediator-outcome association. Gender, for example, in their analysis is included in the control group branch of the tree, whereas job-search self-efficacy, the potential mediator, is included only in the treatment group arm. As a further example, education is included on the treatment group arm, but only after job-search self-efficacy’s role in the model already has already been established through its inclusion on a higher tier of the model. An alternative approach for addressing confounding in the context of CTA would be to residualize the candidate mediator on possible confounders and then fit a classification tree with this adjusted variable. If the residualized mediator continues to discriminate across levels of the outcome, conditional on treatment status, it could be concluded that the mediate-outcome portion of the potential mediational pathway of interest is evident independent of the measured confounders. Applying this approach, I find that the job-search self-efficacy continues to emerge as a discriminating variable for reemployment in the treatment arm branch of the classification tree that I fit with the same JOBS-II data. However, it cannot be assumed that this more rigorous approach to taking into account confounders will always yield the same result as one in which they are merely included as candidate predictors within a CTA.A more fundamental concern with the CTA approach to identifying mediation employed by the authors is that the mediator in the predictor-mediator and mediator-outcome segments of the mediational pathway are not assured of having a consistent definition. This is a result of the optimal cut-point on the mediator being determined separately for the two segments, through an optimal discriminant analysis for the treatment-mediator segment and CTA for the mediator-outcome segment. Thus, in the case of the mediational pathway for reemployment, treatment status predicts a dichotomous measure of job-search self-efficacy determined by a cut-score of 3.92 (i.e., high job-search self-efficacy corresponds to values above 3.92 and low to values equal to and below 3.92), whereas the dichotomous measure of job-search self-efficacy predicting reemployment is determined by a different cut-score of 4.92. Yet, it is essential by definition that the mediator in a mediational pathway that is influenced by the initial variable in the pathway (in this case, treatment) be the same variable that then influences the outcome (in this case, reemployment). In this case, nearly one half of the sample (47.3%; n = 426) has a different high-low classification on the job-search self-efficacy, suggesting considering divergence between the versions of this variable determined by the respective cut-scores. One approach to addressing this concern would be to define a new mediator that reflects the overlapping portion of the differing definitions. In the present example, this could be a dichotomous measure of job-search self-efficacy defined by a cut-score of greater 3.92, which would ensure that all those with scores classified as relatively high on the measure continue to be classified as such; alternatively, a cut-score of 4.92 could be used if priority is given to ensuring that all those classified as low remain in this category. Still another option would be to use the mid-point between the two cut-scores of 4.42. One could then evaluate the mediational pathway of interest using one or more of these options. Applying this approach using PROC CAUSALMED in SAS3, allowing for the suggested treatment-mediator interaction and including all covariates, I find evidence of an indirect mediated pathway when using the lower bound cut-off score of 3.92 for job-search self-efficacy (Odds Ratio for natural indirect effect of .959, 95% CI limits of .893, .998), but not when using the other two cut-off scores, although the differences in estimates are admittedly small in this case (complete results are available upon request).To summarize, Linden and Yarnold (2018) make a significant contribution by introducing CTA as a promising strategy for identifying mediated effects in intervention research. Further refinements to their approach, however, are recommended to more fully incorporate fundamental assumptions that accompany all tests for mediation as well as to evaluate and confirm potential mediational pathways using conventional procedures.References1Linden A, Yarnold PR. Identifying causal mechanisms in health care interventions using classification tree analysis. J Eval Clin Pract. 2018;24:353–361. K, Keele L, Tingley D. A general approach to causal mediation analysis. Psychol Methods . 2010;15:309‐334.3Yung, Y, Lamm, M, Zhang, W. Causal Mediation Analysis with the CAUSALMED Procedure . Paper SAS1991-2018. Cary, NC: SAS Institute Inc.; 2018.