=

Appendix

Not mention stepwise regression. Change it to expert judgment based on main effect plots and interaction plots of each variable and adjusted and predicted R-sq as well as the normality of the residuals.[ For example, we first conducted best-subset regressions to evaluate the best candidate predictors among those shown in Eq. (\ref{eq:P22-correctionMR}) and some of their non-linear forms, for instance, \(e^{\tau_{sca}}\) and \(N^{-\frac{1}{2}}\). Then using the candidate predictors and including their interaction terms, stepwise regressions are used to obtain the regression equations of the coefficients. While the stepwise regression help simplifying the model it is prone to oversimplify the final model by removing some important parameters.  Therefore, by carefully analyzing each model obtained in the stepwise regression process we added or removed some parameters, when needed, based on our expectations and analyzing the main effect and interaction plots for each candidate variable. For example, if the stepwise regression removes the monomer number terms from the final model when fitting the coefficients for the forward scattering we force the model to include it because we know that the monomer number has a strong effect in the forward angles.] Lastly, we tested each selected model for all the coefficients by substituting them in Eq. (\ref{eq:P22-correction}) and extrapolating the P22 up to N = 10^3 monomers and xm = 1.0 along with various refractive index values. Since empirical corrections for P22 calculated in this way is highly sensitive to the accuracies of the multiple regressions for the coefficients in Eq. (\ref{eq:P22-correction}) any over-parameterized or too simple fits for the coefficients result in unreasonable shape in phase functions. This way it was possible to select only the models which not only suffice the statistical criteria discussed above but also produce reasonable extrapolations. A better way to test the model would be to divide the T-matrix data we have into two parts, where 60 to 70% of it could be used to estimate the fits for the coefficients and the rest of it would be used to test their interpolation and extrapolation performance. Unfortunately, our T-matrix data is not sufficiently big to make such a division and used to generate a reliable regression fit at the same time. This is partly because we already divided our T-matrix data into four parts based on different parameter regions in which the parameters have different effects as discussed in .... [ For small N, the interaction between the monomers has more effect in the total? think about it and find references. *For small xm, monomers follow Rayleigh scattering, after xm > 0.6 they start behaving differently than Rayleigh theory predicts. Stop here! This is not the actual reason. More accurately, aggregate scattering starts deviating from Rayleigh Gans Approximation when sp is getting larger because ] However, this stays as a future goal for us. Once we generate a large enough T-matrix runs it will be possible to test the empirical corrections on data that is not used to calculate the model. Although the cross-validation by splitting the T-matrix data was not plausible [ sil burayi: due to the size of the data set and the strong requirement for accurate regression models as discussed above], we use the prediction sum of squares method which is another validation method for regression models. Predicted coefficient of validation, \(R_{predict}^2\), ... (define it here). As shown in Table A.1 [Table of statistics for each model including R^2 values, residual normalization score, either the test method specific scores, like AD or the p-values]. Furthermore, extrapolating to larger monomer numbers is easier since even the largest monomer number does not require significant correction as long as the size parameter is small, such as 0.336  as shown in Fig. XX. The need for empirical corrections increasingly becomes more important as the size parameter increases, but not necessarily change as the monomer size increases. [Give also physical reasoning with a reference]. Besides, the main purpose of our model is to study the planetary aerosols, such as Titan's and Pluto's haze particles. Although these particles are likely to consist of up to a few thousands of monomers, their monomer sizes must be less than a tenth of a micrometer because of the strong linear polarization measured both in Titan's and Pluto's atmosphere in the visible and infrared channels suggesting small Rayleigh like monomers. Therefore, studying such aggregate aerosols in visible or infrared wavelengths does not require extrapolating to size parameter values beyond the model has been tested for.