Proposed Approach

I firstly try to build an algorithm to exact the features of the tweets, which is a typical NLP approach using different text cleaning method. Then, I utilize the tweets with top trends(most popular hashtags) to develop a model with supervised learning, to classify the related top trends. Finally, to evaluate the capability of the approach, I use the true top trends tweets to verify the learning performance. One difficulty involved in this procefure is dimentionality reduction due to the fact that each word is an feature: whether it’s present in the document or not (0/1), or how many times it appears (an integer >= 0), then I applied various techniques towards this problem.