Classification Model Analyses

Based on the scores from classification models (fig. \ref{672684}), all of the five models perform well on classifying Marvel and DC movies using audiences reviews, with accuracy scores ranging from 0.8 to 0.95. Such scores show that it is a significant relationship between the audience reviews of and the movie type (Marvel or DC). Since all movie names, character names, and actor/actress names were removed from the data before feeding into the models, there must be other strong predictors(terms) that classify the two movie types.  Among all these models, Random Forest outperforms the rest of the models with an accuracy of 0.95, recall 0.98, precision 0.88 and F1 score 0.93. Looking at the feature importance ordered by the Mean Decrease Gini score from Random Forest (fig.\ref{693806}), most of the features correspond to the results in the top frequent terms, such as "fun", "team", "hilarious", and "villain", "adaptation", "sin".