Explore-Exploit Dilemma and Reinforcement Learning

loading page

Sarah AlBastaki,
Samuel Feng

Abstract

Every decision making agent faces a dilemma in either exploiting information given or exploring its options in order to choose its desired method of operation, whether it be simple day to day tasks or complex life changing decisions. In order to obtain the maximum possible reward from a set of actions, reinforcement learning algorithms can be implemented. This area of study involves an adaptive agent that learns and evaluates the outcome of its actions based on previous moves. A reinforcement learning agent learns to balance exploring and exploiting according to a given algorithm in order to improve its behavior. The difference in performance between an agent with a given set of policies and human behavior is compared through modelling the decisions made on a common experimentally controlled environment. A certain degree of exploration will be shown to be needed when choosing an action in lengthy games as opposed to always exploiting the supposed 'greedy' option.