Over the whole reanalysis period (2007-2016), we adjust the initial  temperature, salinity, and SIC of the year 2007 (Cini) and daily atmospheric states on the model grid (Catm(t)), which include 2-m air temperature, 2-m specific humidity, precipitation rate, 10-m wind vectors, downward longwave radiation, and net shortwave radiation. Due to the computational limit and potential instability of the adjoint model, we separate the whole 10-year period (2007-2016) into three chunks (2007-2010, 2010-2013, 2013-2016) with a one-year overlap. A total number of ~109 elements are adjusted in each chunk to reduce the cost function. The initial state of the latter chunk is taken from the last iteration of the former chunk to avoid discontinuities when moving to the next period.
       On the right-hand side of equation (1), the first term computes the uncertainty-weighted model-data misfits, where y(t) and x(t) represent observations and model state at time t, and E(t) maps model states to the corresponding observations. T denotes the transpose of the matrix. The remaining three terms penalize the adjustments of the initial state Cini, time-mean atmospheric forcing , and time-varying atmospheric forcing  and are weighted by their prior uncertainties P(0)Qm, and Qa, respectively. Prior uncertainties of the initial state P(0) and of the time-varying atmospheric forcing Qa are computed as the standard deviation of the nonseasonal variability of the corresponding variables using the WOA18 ocean atlas and the NCEP-RA1 reanalysis. Uncertainties of the mean component of 2-m air temperature, 2-m specific humidity, precipitation rate, 10-m wind vectors, downward longwave radiation, and net shortwave radiation are set to 1°C, 0.001 kg kg-1, 1.5*10-8 mm s-1, 2 m s-1, 20 W m-2, and 20 W m-2.
       To stabilize the adjoint model for a four-year assimilation window, we made the following modifications to the adjoint of MITgcm:
1)      disable the K-profiles mixing parameterization scheme,
2)      use a free-drift sea ice dynamic model,
3)      increase the Laplacian diffusivity of heat and salinity to 500 m2 s-1, and lateral eddy viscosity to 10000 m2 s-1,  
4)      apply a filter (Stammer et al., 2018) to the thermodynamic sea ice sub-model to remove spurious strong sensitivities. The filter compares the sensitivity on one model grid with the mean value of the surrounding eight model gridpoints. We set the sensitivity to 0 if the sensitivity is ten times larger than the mean value.   
These modifications stabilize the adjoint model over a long period but, at the same time, degrades the accuracy of the adjoint sensitivity.
A quasi-Newton algorithm based on Gilbert and Lemaréchal (2006) is used to iteratively reduce the cost function by adjusting the control vectors employing the adjoint sensitivities. The optimization stops when the gradient descent algorithm cannot further reduce the cost function due to the degraded usefulness of the adjoint sensitivities. The analyses discussed below are based on the zeroth iteration (referred to as “INTAROS-ctrl”) and the last iteration (referred to as "INTAROS-opt") of the optimization.