Proposed Solution

\label{sec:solution} In this section, ...

Intuition

The goal of this work is to recommend friends for users based on their mobility information. The main hypothesis behind is that friends tend to visit similar places, it is in fact an example of the social homophily theory which states that friends tend to share similar behaviors.

The most straightforward implementation of our hypothesis is to recommend those who have visited many same places as a user as friends. For example, if a user Alice often visits the same coffee shop and restaurant as Bob, then there is a good chance that they will become friends.

Besides, properties of locations that a user has visited such as location category (e.g., a restaurant or a bar) and visiting hours (10pm or 8am) can reflect his behavior as well. Users visiting same kinds of locations at roughly the same time are probably sharing the same hobbies or life patterns. Following the above mentioned social homophily theory, these users are more likely to become friends than others. For example, Bob and Charlie may often visit different model shops on weekends, then it is meaningful to recommend them to be friends.

Taking the user side, a user’s demographic information, such as age and gender, also contributes to the friends he has. For example, suppose that Alice, Bob and Charlie often visit one same place, Alice and Bob have the similar age while Charlie are 20 years older than them, then it is more reasonable to recommend Bob not Charlie as Alice’s friend.

Following the above discussion, our intuition can be summarized as follows:

  • a user is more likely to become friends with those who visit same places as him;

  • a user is more likely to become friends with those who visit similar types of places at the similar time as him;

  • a user is more likely to become friends with those who share the same profile information as him.

Data model

To capture the above intuition, we first model our information into three types of networks including user-location network (\({G_{{U},{L}}}\)), user-profile network (\({G_{{U},{P}}}\)) and location-category network (\({G_{{L},{C}}}\)).

User-location network. A user-location network is a directed weighted bipartite graph denoted by \({G_{{U}, {L}}} = ({U}, {L}, {E_{{U}, {L}}})\). \({U}\) and \({L}\) represent users and locations, respectively, while \({E_{{U}, {L}}} \subseteq {U}\times {L}\) contains all the edges. Each edge \({e_{i, j}^{{U}, {L}}}\) is affiliated with a weight \({w_{i, j}^{{U}, {L}}}\) which is simply defined as the number of times \({u}_i\) visits \({\ell}_j\).

We exploit matrix \({X}\) to represent \({G_{{U}, {L}}}\), \({X}_{i,j}={w_{i, j}^{{U}, {L}}}\) if there exists a link from \({u}_i\) to \({\ell}_j\), otherwise, \({X}_{i, j} = 0\). The transpose of \({X}\) is defined by \({\cal{X}}\). We further use \({\bar{X}}\) and \({\bar{\cal{X}}}\) to denote the column stochastic matrices of \({X}\) and \({\cal{X}}\) where \({\bar{X}}(i, j) = \frac{{w_{i, j}^{{U}, {L}}}}{\sum_{k} {w_{k, j}^{{U}, {L}}}}\) and \({\bar{\cal{X}}}(i, j) = \frac{{w_{j, i}^{{U}, {L}}}}{\sum_{k} {w_{j, k}^{{U}, {L}}}}\).

User-profile network. A user-profile network is a bipartite graph denoted by \({G_{{U}, {P}}} = ({U}, {P}, {E_{{U}, {P}}})\). \({P}\) represents profile nodes and \({E_{{U}, {P}}}\subseteq {U}\times {P}\) contains all the edges in \({G_{{U}, {P}}}\).

Matrix \({Y}\) represents \({G_{{U}, {L}}}\) where \({Y}_{i, j} = 1\) if \({u}_i\) has profile \({p}_j\). We use \({\cal{Y}}\) to represent \({Y}\)’s transpose, \({\bar{Y}}\) and \({\bar{\cal{Y}}}\) are the column stochastic matrices of \({Y}\) and \({\cal{Y}}\), respectively.

Location-category network. A location-category network is a bipartite graph denoted by \({G_{{L}, {C}}} = ({L}, {C}, {E_{{L}, {C}}})\). \({C}\) contains all the location types and \({E_{{L}, {C}}}\subseteq {L}\times {C}\). As presented in our second intuition, a category here is in fact a combination of category and time, e.g., food places during evening. We will talk about it in detail in our experiment.

Matrix \({Z}\) represents \({G_{{L}, {C}}}\) where \({Z}_{i, j}\) indicates that \({\ell}_i\) falls into category \({c}_j\). Similar to the definitions for \({G_{{U}, {L}}}\) and \({G_{{U}, {P}}}\), we use \({\cal{Z}}\) to denote \({Z}\)’s transpose and \({\bar{Z}}\) (\({\bar{\cal{Z}}}\)) to denote the stochastic matrix of \({Z}\) (\({\cal{Z}}\)).

Random walk with restart

Random walk with restarts (RWR) proposed in  can be considered as a variation of Pagerank where a seed node is chosen and random walks on the graph are biased towards restarting from the seed node. This is different from Pagerank which allows random walks to restart from any node in the network uniformly. The steady state distribution of RWR can be interpreted as the probability from the seed node to reach other nodes. RWR has been demonstrated effective in multiple recommendation systems (e.g., see ), the hypothesis behind is that nodes who are more likely to be reached from the seed node (captured by the output probability) are more similar to the seed node which should be recommended to the seed node. RWR can be applied on both homogeneous (e.g., social network) and heterogeneous networks (e.g., author-publication bipartite graph). Recall our intuition introduced earlier in this section, multiple entities are considered for friends recommendation including user, location, user profile and location category. RWR on heterogeneous network naturally fits this goal, thus we adopt it to recommend friends for users.

We first combine the three above mentioned bipartite graphs together as a heterogeneous graph \({G_{}}\) and use a matrix \(M\) to represent it as the following. \[M = \begin{bmatrix} 0 & {\alpha_{1}}\cdot{\bar{X}}& (1-{\alpha_{1}})\cdot{\bar{Y}}& 0 \\ {\alpha_{2}}\cdot{\bar{\cal{X}}}& 0 & 0 & (1-{\alpha_{2}})\cdot{\bar{Z}}\\ {\bar{\cal{Y}}}& 0 & 0 & 0\\ 0 & {\bar{\cal{Z}}}& 0 & 0\\ \end{bmatrix}\]

For a random walk in \({G_{}}\), each user (location) has two types of choice to travel next, i.e., either location or profile (user or category), we introduce a parameter \({\alpha_{1}}\) (\({\alpha_{2}}\)) to adjust the user’s (location’s) choice. Since we focus on friendship recommendation based on mobility information, we set \({\alpha_{1}}=0.7\) to reflect the importance of mobility information compared to profile information. For \({\alpha_{2}}\), we set it to 0.5. In our experiment presented later, we further study the sensitivity of \({\alpha_{1}}\) and \({\alpha_{2}}\).

For a user \({u}_i\), we define two \((\vert {U}\vert + \vert {L}\vert + \vert {P}\vert + \vert {C}\vert)\times\)1 vectors \({v_{i}}\) and \({\textbf{1}_{i}}\), \({v_{i}}\) represents the steady state distribution of RWR while \({\textbf{1}_{i}}\)’s \(i\)-th element is set to be 1 and others to be 0. The function of \({\textbf{1}_{i}}\) is to regulate a walk to restart at \({u}_i\) with a certain probability, in this way, the steady state probability we obtained in the end can be interpreted as the probability from \({u}_i\) to reach each node in \(G\). In the end, users with highest probabilities are recommended to \({u}_i\) as friends.

We exploit the power iteration method to find the steady state distribution \({v_{i}}\), the computation is formalized by the following equation. \[{v_{i}} = (1-\epsilon)\cdot M\cdot {v_{i}} + \epsilon\cdot{\textbf{1}_{i}}\] The factor \(\epsilon\) specifies the probability of restarting random walks at \({u}_i\), and it is set to be 0.15 following the settings of . The first \(\vert U\vert\) elements in \({v_{i}}\) represents \({u}_i\)’s probabilities to reach each user in \({U}\). After obtaining the final result, we recommend users with highest probabilities as \({u}_i\)’s friends.