Exploring the behavioural spectrum with efficiency vs

Zafeiris Kokkinogenis; Margarida Silva; Jeremy Pitt; Rosaldo Rossetti

doi:10.36227/techrxiv.19388147.v2

loading page

Exploring the behavioural spectrum with efficiency vs

Zafeiris Kokkinogenis ,
Margarida Silva ,
Jeremy Pitt ,
Rosaldo Rossetti

Abstract

The concept of fairness has been studied in philosophy and economics for thousands of years, so human actors in social systems have had plenty of time to “learn” what does, and does not, work. Yet, only recently. However, it is a relatively new question how software agents in a multi-agent system can use Reinforcement Learning models to develop an architecture that promotes equality or equity in the distribution of rewards to the agents within the system. Recent significant contributions have focused on optimising for efficiency based on the assumption that efficiency and fairness are opposites to be traded off against each other, but actually, the result of mixing fair and efficient policies is unknown in multi-agent reinforcement learning settings. In this work, we experiment with fair and efficient behaviours jointly, based on an extension of the state-of-the-art model in fairness SOTO that intertwines efficient and equitable recommendations. We analyse the fair versus efficient behavioural spectrum in the Matthew Effect and Traffic Light Control problems, finding some solutions that outperform the baseline SOTO and others that outperform a selfish baseline with comparable architectural design. We conclude it is possible to optimise for fairness and efficiency and this is important when computation of the reward distribution has to be paid for from the rewards themselves.