1. Roche et al 2014 - was about partial volume estimation using a bayesian MAP formulation that extends the mixel model. Our segmentation results do not account for partial voluming effects. In the text we added:

    “Another factor not accounted for in our segmentation results was the effects of partial voluming, which adds uncertainty to tissue volume estimates. In (Roche 2014), researchers developed a method to more accurately estimate partial volume effects using only T1-weighed images from the ADNI dataset, which resulted in higher classification accuracy between Alzheimer’s disease (AD) patients and mild cognitively impaired (MCI) patients from normal controls (NL).”

  2. Wolz et al 2014: LEAP algorithm on hippocampal volumes, compared between 1.5T and 3T found small bias (1.17% mean signed difference) between field strengths on ADNI data. Jovicich 2009 - they found that test-retest reproducibility does not change much cross platforms and field strengths. In the discussion section I’ve added:

    “Even though we did not standardize the protocols and scanners within this study, the consortium is unbalanced in that there are 16 3T scanners of which 11 are Siemens. Of the Siemens 3T scanners, there is little variability in TR, TE and TI, however, there is more variance in the use of parallel imaging, the number of channels in the head coil (12, 20 or 32), and the field of view. We could not detect differences in scan-rescan reliability between field strengths, similar to the findings of (Jovicich 2009). Wolz and colleagues also could not detect differences in scan-rescan reliabilities of the hippocampus volumes estimated by the LEAP algorithm, but they detected a small bias between field strengths, where the hippocampus volumes in the 3T ADNI scanners were 1.17 % larger than the 1.5T (Wolz 2014). A two-sample T-test with unequal variances was run between the scaling factors of the 1.5T versus 3T scanners, and we could not detect differences in any ROI except for the left- and right- amydgala. We found that the scaling factors were lower than the 3T scanners (.9 versus 1.02), meaning that the amygdala volume estimates from the 1.5T were larger than those of the 3T. However, this interpretation is limited due to the small sample size of 1.5T scanners in this consortium.”

  3. Wyman et al. 2013: I’m not sure if this is the correct paper you are referring to, but this paper emphasized the use of ADNI’s standardized data sets, which they say add (1) Greater rigor in reporting (2) The ability to compare various techniques side-by-side 3) The ability to evaluate robustness of a given technique and 4) The ability to replicate methods. In order to enable other researchers to compare our methods, evaluate robustness and replicate our study, we will provide, in the supplemental materials, the raw data on MRI volumes produced from Freesurfer, along with the python and R code to calculate scaling factors, leave-one-out calibration, and between-/within- site ICC.

  4. Whitwell et al. 2012: Found different rates of hippocampal atrophy in the ADNI cohort than the Mayo Clinic Study of Aging cohort, even though there were no differences in hippocampal volumes between the matched cohorts. This is attributed to sampling of different populations rather than differences in hippocampal volumes due to differences in acquisition parameters. In the text I added:

    “The other limitation of this study is that we assumed that subjects across all sites will come from the same population, and that stratification occurs solely from systematic errors within each site. In reality, sites may recruit from different populations, and the true disease effect will vary even more. For example, in a comparison study between the matched ADNI cohort and a matched Mayo Clinic Study of Aging cohort, researchers found different rates of hippocampal atrophy even though no differences in hippocampal volumes was detected (Whitwell 2012). This could be attributed to sampling from two different populations. This added site-level variability requires a larger site-level sample size, for an example of modeling this, see (Han 2011). ”