Appendix A in its Figure 1 presents a graphical summary of the regression of sorghum, maize, and rice yields onto population and precipitation.
 Figure 1.a. shows the plot of observed and predicted sorghum yield on training data. Figure 1.b displays one of the 5-fold cross-validation curves used to estimate the test error that results from predicting sorghum production using population and precipitation. The cross-validation error curve show that degree 1, 2, or 3 polynomials seem to fit the model the closest while also retaining the most robust predictive performance. Since the difference is insignificant, it is best to perform a simpler model, if possible. Above a degree of polynomial 3, overfitting occurs, leading to the loss of the model’s predictive power. Figure 1.c. presents the observed and simulated sorghum production using test data. The correlation coefficient between observed and simulated sorghum production is 0.76.
            Figure 1.d. is the plot of observed and predicted maize yield on training data, and Figure 1.e is the 5-fold cross-validation error curve resulting from predicting maize production using population and precipitation. The cross-validation shows that a degree 1 polynomial seems to fit the model at its highest predictive performance. The error increases a degree 2 then decreases a degree 3. A simpler model may be preferable with this error difference. However, due to fluctuations in the errors, we recommend a more complex model to analyze the relationships between maize yield and the predictors. Overfitting occurs beyond a degree 3 polynomial, causing the loss of the model’s predictive power. The observed and simulated maize using test data are shown in Figure 1.f. The correlation coefficient between observed and simulated maize is 0.77.
The plot of observed and predicted rice yield resulting from the regression of rice production onto population and precipitation on the training data is presented in Figure 1.g. and Figure 1.h. shows the corresponding 5-fold cross-validation error curve. A degree 1, 2, 3, or 4 polynomial seems to better fit the model while maintaining its predictive performance. A simpler model is commendable, if possible. Nevertheless, because of the error fluctuations, it is preferable to use a more complex method to investigate the associations between rice and the predictors. The predictive power decreased beyond a degree 4 polynomial due to overfitting. The observed and simulated rice production from the test data are shown in Figure 1.i. The correlation coefficient between the observed and simulated rice from the test data is about 0.49.
 
Figure 2 of Appendix A is the 3D scatterplot showing the relationships among the crop yields, population, and precipitation. Figure 2.a. shows the association among sorghum, population, and precipitation. Low population correlates with a reduction in sorghum production. In contrast, a high population combined with higher precipitation results in a larger sorghum production. The optimal sorghum yield appears to be between the highest population (≈12 million) and approximately 4,000 mm of precipitation per year. Figure 2.b. shows the relationships among maize, population, and precipitation. Fewer people correlate with low maize yields. However, an average or greater number of people combined with an average precipitation lead to an approximately average production of maize. The largest population combined with precipitation between 3,500–4000 mm per year correlates with the optimal maize production. Figure 2.c. presents the interdependency among rice, population, and precipitation. As in the two previous situations, rice production is low when population is smallest. Nevertheless, rice production oscillates between around 150,000 tons per year, population between 7 million and 11 million, and precipitation values between 2,500 mm per year and 3,500 mm per year. The optimal rice production occurs with around 12 million people and 4000 mm per year.
The linear regression assumes a straight-line relationship between the predictors and the response, but the true relationship is far from linear. Residual plots are a useful graphical tool to detect non-linearity (Gareth et al. 2014).
Appendix A through Figure 3 (a, b, and c) displays the 3D scatterplots of the residuals of the regression of the respective crop yields onto population and precipitation. This shows both the positive residuals (visible above the surface) and the negative residuals (below the surface). Sorghum’s prediction residuals (Figure 3.a) tend to lie closer to the 45-degree line, where population and precipitation are evenly split; the residuals are almost symmetrical. In contrast, maize and rice residuals (Figure 3.b and c) are at a distance from the 45-degree line; they are more unsymmetrical. From the patterns of the residuals, we can that see that there are pronounced non-linear relationships between the crops and the independent variables. This pronounced non-linear pattern cannot be exactly modeled using linear regression. This suggests a synergy or interaction effect between the population and precipitation. Therefore, combining the independent variables results in a bigger boost to crop yields than using them alone. Below are the interaction models to predict sorghum, maize, and rice.
The synergy or interaction effect between population and precipitation on sorghum production gives Equation 18 for the regression analysis. The model explains 75% of the variability of sorghum, which was a little bit more than the bilinear regression (73%). The model is statistically significant with a p-value of 2.488 x 10-6. However, none of the estimates are statistically significant because all the p-values are greater than 5%.