The overall goal of this project was not to claim that the method of scanning 12 phantom subjects was cost effective. Rather, the goal was to measure MRI-related biases when systems are not standardized, and then see how one can overcome these biases with proper sample sizes, rather than a costly calibration method or harmonization (for the case of retrospective data). This also allows sites the freedom to upgrade hardware/software or even change sequences during a study. This might be an incentive for sites to contribute data even if they are given little financial support. The phantom calibration aspect has been minimized and our statistical model that accounts for MRI-related biases has been emphasized. The measurements of that bias (which were estimated via calibration) are an important part of this study because they validate the scaling assumption of the statistical model and provide researchers values to plug into the power equation. Our framework provides an alternative method to ADNI harmonization, rather than a strict improvement. The human phantom calibration showed that the overall absolute agreement between sites improves to the same level of ADNI-type harmonization. Our results are compared to other harmonization efforts in the manuscript and in the following response.