• Does the method here proposed offer improved multi-centric reliability than other studies? The across site reliability measures obtained with the proposed calibration do not appear to be placed in perspective with the vast literature on this topic (for example, but not limited to: Wolz et al. 2014; Roche et al., 2014; Jovicich et al., 2013). In particular, this last study shows inter-site ICC measures on many of the same structures reported here, also obtained using Freesurfer, but with notably higher reliability than the calibrated results reported here:

    Structure Between site ICC after calibration in this study (Fig. 2) Jovicich et al., 2013 (Suppl. Table 1)

    Lateral ventricle 0.96 0.998

    Thalamus 0.78 0.972

    Hippocampus 0.88 0.951

    Amygdala 0.82 0.939

    Caudate 0.92 0.942

    Authors should discuss potential reasons for such differences, for example in the context of acquisition variability, calibration methods or segmentation methods (Freesurfer longitudinal versus cross-sectional or other methods).