- Methods: common words removed shared words, modeling (SVM, nb, rf used to classified movie type using review text(rf is the best one based on the accuracy graph) . As expected, most words/important feature involved movie/people/character names in the movie ->move major movie names and character/people names and run everything again
Results
- Cosine similarity/distance before and after removal
- most frequent words for two movies
- results from rf modeling
- Accraucy of using positive/negative reviews
- feature importance for two movies, before and after removal of the names
Discussion
- Difference
- Limitation and bias
- data size
- R computation capacity