Recommending people to follow

Two studies were the earliest to explore recommendation of people to follow. Hannon et al. \cite{Hannon_2010} used CB and CF to recommend “followees” on Twitter. They examined several ways to generate user profiles, based on the user’s own tweets, the user’s followers, the user’s followees, the user’s followers’ tweets, and the user’s followees’ tweets. The open source search engine Lucene was used to index users by their profile, after applying TF-IDF to boost distinctive terms or users within the profile. They applied an offline evaluation using a dataset with 20,000 Twitter users. 19,000 were used as a training set and the remaining 1,000 were the test users. The different methods were compared based on their ability to predict the user’s followees. A slight advantage was observed to profiles that were based on followers and followers’ tweets. Hybrid profiles further improved the precision. A small-scale live trial was also conducted where users indicated whom they were likely to follow. On average, hybrid approached reached about 7 out of 30 accurate recommendations.

A second study was performed by Brzozowski and Romero \cite{brzozowski2011should}, who experimented with the WaterCooler enterprise SNS. During a 24-day live trial period, they observed patterns of 110 users who followed 774 new individuals. The strongest pattern found was of the form \(A \leftarrow X \rightarrow B\), meaning that sharing an audience (follower) with another person is a strong reason to follow that person. Most-replied was found as a strong global signal. Similarity and most-read were found as weaker signals for followee recommendation.

In a more recent study, Gupta et al. \cite{Gupta:2013:WFS:2488388.2488433} revealed some details about the followee recommender systems in use by Twitter. From an architectural perspective, they noted the decision to process the entire Twitter follower-followee graph in memory using a single server, which contributed to the performance of the feature. They developed an open-source in-memory graph processing engine to traverse the Twitter graph and generate recommendations. The algorithm used was a combination of a random walk and SALSA \cite{Lempel:2001:SSA:382979.383041}, combining two approaches: the first gives each user the same influence regardless of the number of users they follow or are followed by and the second gives equal influence to each follower-followee edge.