Abstract
The concept of fairness has been studied in philosophy and economics for
thousands of years, so human actors in social systems have had plenty of
time to “learn” what does, and does not, work. Yet, only recently.
However, it is a relatively new question how software agents in a
multi-agent system can use Reinforcement Learning models to develop an
architecture that promotes equality or equity in the distribution of
rewards to the agents within the system. Recent significant
contributions have focused on optimising for efficiency based on the
assumption that efficiency and fairness are opposites to be traded off
against each other, but actually, the result of mixing fair and
efficient policies is unknown in multi-agent reinforcement learning
settings. In this work, we experiment with fair and efficient behaviours
jointly, based on an extension of the state-of-the-art model in fairness
SOTO that intertwines efficient and equitable recommendations. We
analyse the fair versus efficient behavioural spectrum in the Matthew
Effect and Traffic Light Control problems, finding some solutions that
outperform the baseline SOTO and others that outperform a selfish
baseline with comparable architectural design. We conclude it is
possible to optimise for fairness and efficiency and this is important
when computation of the reward distribution has to be paid for from the
rewards themselves.