Alberto Pepe edited more stuff.tex  almost 10 years ago

Commit id: 2e19c68eba7ebf59db161336b37d5bdfff1e68c2

deletions | additions      

       

\[ \theta_{m_d}[i,k] = [ m_{i}, m_{i+1}, m_{i+2}, \cdots, m_{i+k}] \]  A different number of tweets is submitted on any given day. Each entry of $ \theta_{m_d}[i,k]$ is therefore derived from a different sample of $N_d = ||T_d||$ tweets. The probability that the terms extracted from the tweets submitted on any given day match the given number of POMS adjectives $N_p$ thus varies considerably along the binomial probability mass function:  \[P(K=n) = \left(\begin{array}{c}N_p\\||W(T_d)||\end{array}\right)p^{||W(T_d)||}(1-p)^{N_p-||W(T_d)||}\]  where $P(K=n)$ represents the probability of achieving $n$ number of POMS term matches, $||W(T_d)||$ represents the total number of terms extracted from the tweets submitted on day $d$ vs. $N_p$ the total number of POMS mood adjectives. Since the number of tweets per day has increased consistently from Twitter's inception in 2006 to present, this leads to systemic changes in the variance of $\theta_{m_d}[i,k]$ over time. In particular, the variance is larger in the early days of Twitter, when tweets are relatively scarce. As the number of tweets per day increases, the variance of the time series decreases. This effect makes it problematic to compare changes in the mood vectors of $\theta[i,k]$ over time. \cite{Meier_2012}