Self-Interested MAPF

MAPF algorithms have largely focused on a cooperative setting in which solutions aim at minizing the total cost for all agents. In such settings, agents are willing to incur a higher cost (i.e., take a longer path) to decrease the total cost of all agents. However, there are many settings in which agent may be self-interested, seeking to minimize their own costs without considering other agents’ welfare. For example, if rovers of different research organizations are operating on mars, they might want to traverse a path that is optimal for them and not consider other rovers. Self-interested behavior by agents can lead to suboptimal outcomes and reduced social welfare \cite{bnaya2013multi}.

An important property for mechanisms that allocate resources to self-interested agents is strategyproofness. When using a strategyproof mechanism, agents cannot manipulate the outcome to benefit themselves by not revealing their true preferences. To illustrate, we consider the graph shown in Figure \ref{fig:strategy}. Agent \(a_1\) is located at \(s_1\) and its goal is located at \(g_1\). Similarly, \(a_2\) is located t \(s_2\) and its goal is located at \(g_2\). Each edge is labeled with its cost.

Assume we have a centralized algorithm (e.g., \(A^*\)) that asks agents for their costs and then computes an allocation of paths that minimizes the total cost for the agents. If the agents report their paths truthfully, the algorithm will find the optimal allocation, according to which \(a_1\) will go to its goal through \(X\) (cost of 2.5) and \(a_2\) will go through \(Y\) (cost 2). This will result in a total cost of 4.5. Note, however, that \(a_1\) would have preferred the path that goes through \(Y\) as it only costs 2. If \(a_1\) is self-interested, it might try to manipulate the outcome by reporting its costs untruthfully. Specifically, it could report that the cost of the edge from \(s_1\) to \(X\) costs 4 instead of 1.5. Then, the centralized algorithm will allocate \(a_1\) the path that goes through \(X\) and will allocate \(a_2\) the path that goes through \(Z\), thinking that these paths, with a total cost of 4.7 are the optimal allocation. Thus, this mechanism is not strategyproof.

In contrast with the MAPF literature, the auction literature have typically considered self-interested settings and thus developed mechanisms that are robust to various types of manipulations. Mapping MAPF problems to CA thus suggests an approach for solving self-interested MAPF problems. Specifically, the auction mechanism elicits agents’ preferences over paths through their bids and determines an allocation of paths. The mechanism then provides each agent with a permit to the path that was sold to it. We further assume an enforcement mechanism (e.g., fines) that prevents agents from traversing paths that were not allocated to them.

An example for an auction mechanism that is robust to the type of manipulation exhibited above is the Vickrey–Clarke–Groves (VCG) auction \cite{clarke1971multipart,vickrey1961counterspeculation,groves1973incentives}. In this mechanism, agents submit bids on multiple items based on their valuation for the items. However, they are then asked to pay based on the “harm” they caused to other bidders. ***add formal notations and definition***

For example, assume that both \(a_1\) and \(a_2\) get a value of 10 for acheiving their goal and incur a cost for traversing a path based on the edges’ costs. Then, the value for \(a_1\) from getting the path \(s_{1} \rightarrow \)X\( \rightarrow g_1\) is \(10-2.5=7.5\), while its value for the path \(s_{1} \rightarrow \)Y\( \rightarrow g_1\) is \(10-2=8\). Similarly the values for \(a_2\) are 8 for the path through \(X\) and 7.3 for the path through \(Z\). If the agents bid truthfully, the auctioneer will allocate \(a_1\) the path \(s_{1} \rightarrow \)Y\( \rightarrow g_1\) and will allocate \(a_2\) the path \(s_{2} \rightarrow \)X\( \rightarrow g_2\), as this allocation maximizes the total value for the agents (\(8+7.5=15.5\)). The payment for the agents is calculated according to the harm they caused to the other agents. In this case, \(a_1\) will pay \(0\), as it did not cause any harm to \(a_2\) (\(a_2\) got its highest valued path); \(a_2\) will pay \(0.5\) as it prevented \(a_1\) from getting a path that was valued higher by 0.5 from its allocated path (valuation of 7.5 compared to a valuation of 8). The utilities for the agents \(a_1\) and \(a_2\) are thus \(u_1=10-0=10\) and \(u_2=10-0.5=9.5\) correspondingly.

Importantly, it has been proven that when using this mechanism agents do best by revealing their true preferences and values and cannot manipulate the outcome in a useful way by not being truthful***cite Vickrey***. To illustrate, assume that \(a_1\) would have bid only 5 on the path \(s_{1} \rightarrow \)X\( \rightarrow g_1\), and 8 for the path \(s_{1} \rightarrow \)X\( \rightarrow g_1\), and that \(a_2\) bids truthfully as before. In this scenario the auctioneer will allocate the path \(s_{1} \rightarrow \)Y\( \rightarrow g_1\) to \(a_1\) and \(s_{2} \rightarrow \)Z\( \rightarrow g_2\) to \(a_2\). So \(a_1\) was indeed able to manipulate the outcome. However, such a manipulation does not benefit \(a_1\) due to the Vickrey payment scheme: \(a_1\) will now need to pay for the harm it caused \(a_2\), which is \(8-7.3=0.7\). So its final utility will be \(10-0.7=9.3\). Importantly this utility is lower than its utility when reporting truthfully, which was \(10\).

To the best of our knowledge, the only mechanism proposed for non-cooperative pathfinding is the Iterative Taxation Framework (ITF) by Bnaya et al. \cite{bnaya2013multi}. This mechanism assigns “taxes” to pairs of location and time, driving agents to follow paths with higher-social welfare. In contrast, the auction mechanism sells paths to agents. This mechanism has several drawbacks compared to optimal, strategyproof auction mechanisms: (1) it does not necessarily maximize the social welfare; (2) it requires that agents have equal costs over edges and that the mechanism knows these costs; (3) even if the costs on edges are equal, the mechanism is not strategyproof: ITF requires that agents report their start and goal locations and agents could potentially manipulate the outcome by reporting these locations untruthfully in a way that would result in lower taxes on their paths.