For this association mining exercise, the ’Appriori’ algorithm is used to make the associations. In order not to have too many association rules to delineate, we restricted ourselves to a minimum of Support of \(45~\%\) (the maximum support in this dataset being \(50~\%\), this substantially reduced the itemsets) and and a minimum of Confidence of \(80~\%\) (the maximum Confidence for associations having Support bigger or equal to \(45~\%\) being \(91~\%\)).
One should be aware that the cars dataset contains many non binary variables. Therefore, such variables were binarized (using the median of a variable as pivot for the attribution of zeros and ones) previous to the use of the ’Appriori’ algorithm.
Table \ref{ItemsetsCarsDataTable} shows one all the items and combination of items having a support (i.e. frequency in this case) bigger or equal to \(45~\%\). This means for instance that the item ’WheelBase’ (i.e. cars ’Wheelbase’ that are bigger than the median of the Wheelbase variable) is present in at least \(50~\%\) of our observations.
It is interesting to note that all the items present in table \ref{ItemsetsCarsDataTable} are part of the binarized variables. None of the initial dummy variables have frequencies as big as \(45~\%\), which is a particularly high threshold in practice. On the other hand, it makes perfect sens that most of the binarized variables are above the \(45~\%\) threshold. Indeed, by binarizing using the median as pivot, the resulting frequencies of the variables should all be around \(50~\%\) if no ties, and therefore no random classification as ones and zeros, were observed.
Table \ref{AssocRulesCarsDataTable} portrays the association rules resulting from the application of the ’Appriori’ algorithm. Columns ’X1 name’ and ’X2 name’ depict the items presented in Table \ref{ItemsetsCarsDataTable} (i.e. the ones with an initial sufficient support) and column ’Y name’ represents the items which, when associated with the items presented in ’X1’ and ’X2’, still have a Confidence of at least \(80~\%\) (i.e. that Y is present at least \(80~\%\) of the time when ’X1’ and ’X2’ are present).