Richard Guo added It_can_be_observed_that__.tex  over 8 years ago

Commit id: e46976357dee1938715603d0d00f3cf112e4ea15

deletions | additions      

         

It can be observed that there are big single spikes that are (1) highly localized in time and (2) shared by more than one hashtags. In the following, we will use PCA to identify those singleton spikes.   Let $X = [x_1, x_2, \cdots, x_N]$ be a $T \times N$ matrix, of which a row corresponds to a day and a column corresponds to a hashtag. We performed standard PCA (centered, unscaled) on top $N=200$ hashtags. By treating days as "observations" and hashtags as covariates, we obtain principle components $z_i \in \mathbb{R}^T$ for $i=1,\cdots, N$. The variance explained declined rapidly, with top 4 principle components dominating the dataset.