Efficient Distributional Reinforcement Learning with Kullback-Leibler
Divergence Regularization
Abstract
In this article, we address the issues of stability and data-efficiency
in reinforcement learning (RL). A novel RL approach, Kullback–Leibler
divergence-regularized distributional RL (KLC51) is proposed to
integrate the advantages of both stability in the distributional RL and
data-efficiency in the Kullback-Leibler (KL) divergence-regularized RL
in one framework. KLC51 derived the Bellman equation and the TD errors
regularized by KL divergence in a distributional perspective and
explored the approximated strategies of properly mapping the
corresponding Boltzmann softmax term into distributions. Evaluated by
several benchmark tasks with different complexity, the proposed method
clearly illustrates the positive effect of the KL divergence
regularization to the distributional RL including exclusive exploration
behaviors and smooth value function update, and successfully
demonstrates its significant superiority in both learning stability and
data-efficiency compared with the related baseline approaches.