These are my thoughts about the discussion paper by \citet{Lord_2017}. I would very much like to see this work published. I feel that it is a good manuscript that describes a methodology/approach that has the potential to revolutionise the field of paleoclimate research. I feel that the applications the authors chose to highlight the potential are solid, but could be even stronger. I must confess that statistical background is not strong enough to adequately assess the description in section 2. But I know that the creation and application of emulators is fairly standard practice in some disciplines, so I suspect that this description will be correct. Nonetheless I do have some queries about the methodology, that I anticipate will be explained better in a revised manuscript. I also have a couple of suggestions about the climate results that I outline below.Broad comments and questionsl132. I am not sure how you define what points are "outside that range" of a sparsely sampled 4 dimensional parameter space. I appreciate the caveat, but wonder if you are being too strict here.l290. I did not fully understand why you wanted to discuss the GCM simulations as two different ensembles. Certainly we later find out that only half of the parameter settings in the highCO2 ensemble actually worked, and then that the emulator was built using all the simulations anyway. You devote a lot of time to explaining and differentiating them, and I wasn't sure why. Given the aim of the paper is to highlight a methodology, I think it may be more useful to auto-critique your approach and make a recommendations in retrospect. For example, would 60 simulations spread across the whole combined parameter space have been a more logical choice?l362 & figure captions. You constantly stress that 'all SAT anomalies are vs preind'. I understand this choice, which is perfectly sensible. However, it is often unnecessary to state. For starters, would not the use of PCA in emulator creation automatically remove the mean. You may actually be removing two different means with the PCA, I'm not sure.Sect 3.5.1. Clearly you need to account for the ice-sheet changes, and your approach seems fair. However, I was confused as to why you think of it as pattern-scaling and applied it so early on. If I understand correctly, you determined that the ice-sheets have a small and constant temperature pattern. You then remove this pattern - giving an ice-sheet correction. As it is a constant, it should not matter when you apply this correction. I am therefore confused why you build a whole lowice emulator, rather than just adjusting the SAT returned by the modice emulator. It is strange that you need a different number of PCA to explain this factor. If the ice-sheet correction is invariant to parameters (as you demonstrate), then you really could apply a pattern-scaling to allow for different ice-sheet sizes as a post-facto correction.Sect 3.6.1. I appreciate the need to adjust the simulations to get to a full equilibrium. I agree that using the Gregory et al (2004) method is a suitable one for CO2 forcing of global mean temperatures. I was surprised that there was no discussion whatsoever about its relevance for either orbital-forcing or local SAT changes.All of the orbital perturbations you make will cause little change in the global, annual mean TOA flux (there may some fast feedbacks, but only v. minor). Using the Gregory formulation, all three orbital dimensions cannot cause changes in the global mean climate. This would not an issue if HadCM3 is fully-equilibrated to all the orbital changes within the 500 year simulation. That may be true for the global mean SAT change...Your multiplication of the ratio of global-means definitely is a pattern-scaling. You explicitly state that the ratio is assumed to be equally applicable to all grid-boxes. If you change the orbital configuration, is that assumption valid? You provide no assessment of the methodology on anything but global mean values - yet the rest of the paper looks at regional patterns.Incidentally, you state that the temperature changes are with respect to preindustrial (and presumably the TOA fluxes are too). The Gregory method requires them to be w.r.t. the starting conditions though. I guess that all simulations have been initiated from a preindustrial spun-up state, but this is never mentioned.Has the Gregory method previously been applied to models with dynamic vegetation components? How can we be sure that it valid as stated in l480.I did not find Fig. 7 increased my confidence in the correction. In fact, I wasn't sure what I supposed to take away from it .Have you scaled the precipitation patterns by the temperature too? If so this needs to be explicitly mentioned and discussed/justified separately.Sect. 5.2 I did not find the discussion of the orbital frequencies particularly informative. Your method effectively assumes that at each gridbox the temperature is a function of the orbit and co2 - i.e. $E_{SAT}=f(\epsilon,e,\pomega,CO_2)$. So why are you trying to make a big deal of the fact that the input timeseries show up in the wavelet diagrams. I thought that the emulator allowed you do much more sophisticated analysis of these kind of input/output relationships. For example, could you not plot the expected ratio of the precessional to obliquity ratio as a spatial map for the late Pliocene. At a minimum, I feel that you need to show something related to the 3 input timeseries to show how the emulator is altering the expression of them. Similarly, I did not get much from Sect 6.2 and Fig. 16.Sect 5.3 I love the idea of using the emulator to back out the CO2 from the proxy records, yet have reservations about what you have done here. Firstly, I felt that the approach described in l701-704 seems very crude considering the fact you have an emulator characterising the full relationship between the input parameters. Secondly, I was surprised that you focus your effort on trying to get at the mean CO2 concentration. What stands out most obviously to me looking at Fig. 12 is the temporal variability in your reconstructed efforts vs the proxy estimates. All the proxy CO2 records cannot capture the amplitude of the glacial-interglacial CO2 variations (although Martinez Boti et al get closest). PlioMIP2 is simulating just an interglacial, but with CO2 estimates from the mean of the full period. Incidentally, I was surprised that the error bars were so small on your reconstructions in Fig. 12.