Limitation and bias
One limitation of this study is about the data size, which Rottentomatoes only provides the latest 1000 reviews for each movie. And due to the limited computation capacity of R, we had to sample our data again when running classification models and topic models. Furthermore, we only conduct analyses of English reviews, while there are many reviews in other languages. Based on these two reasons, our sub-sampled reviews could be less representative or even bias towards the true evaluations of movies.
Another bias is the use of review data to interpret the nature of movies (ex. style, theme, etc.). Movie reviews are the best indicators of audiences' experience, but they do not necessarily represent the movies themselves. Also, there could be certain topics that are more likely to be discussed in the reviews over the others. Therefore it could be biased to use the differences in reviews to infer the differences in movies.
Future work
To mitigate the bias in reviews, we can integrate movie abstracts, scripts and dialogues into our analysis for a more comprehensive result. We can also explore platforms other than R studio to ensure a more robust and less biased computation for our analysis.
Acknowledgments
We sincerely appreciate Professor Arthur Spirling and Teaching Assistant Pedro Rodriquez for their support on our project. We give special thanks to
Rotten Tomatoes for aggregating reviews for movies and making the review data available on the website for us to use in our project.