Appendix

Regression OLS

\label{RegressionOLSSubsection} In this section the use of Ordinary Least Squares Regression is discussed.

Since the independence of the explanatory variables is one of the assumptions of the OLS regression, one should be careful when selecting these independent variables in the model at the risk of getting severe multicollinearity in the model.

Even though no correlation between variables does not necessary mean that these variables are independent, a practical way of reducing such risks of multicollinearity is to base the selection of the independent variables based on their correlation and exclude from the model the explanatory variables characterized by absolute values of their coefficients greater than 0.71

A selection of explanatory variables based on their correlation matrix presented in the previous report was made.

Furthermore, one should be careful not to keep variables which are not statistically significant to a certain threshold (e.g. p-values of the coefficients smaller or equal to 0.05). In addition to prevent one from using variables that are poor predictors, this allows one to produce sparser models, which is usually sought in modeling.

A possibility for only accounting for significant variables is to use the stepwise regression. Basically, such method uses an algorithm which can be summarized as follows \cite{IMM2012-06787} ;

  • adding the most correlated explanatory variable with the price ;

  • producing the model ;

  • testing all the variables in models using a F-test to verify these are statistically significant ;

  • removing all variables that do not meet the threshold defined in the previous step ;

  • add the most correlated explanatory variable with the price that has not been included in the model yet and reproducing the other previous steps ; item Once all the variables were included at least once in the model and once all the non significant variables were brought out of the model, the algorithm stops.

However, since the dimensionality of the problem is not too high, a manually approach is performed in the following Sub-section (\ref{FitModelSubsubSec}).

Fit of the model

\label{FitModelSubsubSec} Table \ref{1stOLSCoeff.tex} shows one the resulting coefficients with their p-values, confidence interval and so forth, for the first OLS regression using all the explanatory variables. One can easily note many of these variables are not statistically significant.

input1stOLSCoeff

Table \ref{LastOLSCoeff.tex} shows one the variables remaining after only using the explanatory variables not too highly correlated among them and which remain significant for explaining the formation of the prices.

Figure \ref{Heteroscedasticity} shows one that the last model does not show high heteroscedasticity (i.e. specific pattern in the residuals). Therefore, no specific mathematic transformation will be applied to the variables.

inputLastOLSCoeff


  1. In the literature, the the rule of thumb for such coefficients varies from 0.7 to 0.85.