Proposal cs221 project: Sheepherding


Sheephearding is one of the oldest trades, where a herder (with help of one or multiple dogs) tries to bring sheep together at some target location. In this project, we define a simplified model of sheepherding where dogs try to move sheep to a target. We then use simulations and apply reinforcement learning to train an AI for the dogs that is capable of performing several tasks related to sheepherding. In particular, we are interested in cooperative behavior: can dogs cooperate efficiently without communication, or does a central authority that commands to the dogs much better at herding?


First, we give a brief overview of some related literature. In (Vaughan 1998), a method is proposed to gather ducklings using a robot in a simple circular area, but their approach is not based on any learning. (Cowling 2010) does discuss methods for herging sheep from an AI perspective, where they use a hierarchical and stack-based finite state machine. (Schultz 1996) proposes to use genetic algorithms to fing decision rules to model robots behavior. In a case study, one robot tries to guide another robot to a target location. An interesting perspective is given in (Potter 2001), where so-called specialists are discussed. Sometimes it is more useful for agents to specify in specific tasks (such as in playing soccer), while for other tasks this is not useful. Finally, (Kok 2005) discusses an approach where agents start by acting individually, but learn how and when to cooperate if needed. This, they argue, leads to a small state space because only necessaray interaction is taken into account.

Eugene Verboeckhoven, A Shepherdess with her Flock.

A physical model

Underlying all simulations is a physical model that describes the world, and the movement of sheep in particular.

The world

The world is modeled as a box, \([0, x_{m}]\times[0,y_{m}]\). We then represent the sheep, dogs and target(s) as points in this space. The majority of the literature uses a grid to model the world, but we choose not to discretize the space, since that allows for more prolific movement, a more realistic environment, and a more interesting modeling perspective. Sometimes, people add other agents or obstacles to the world, but we propose to leave those out to start with.

Physics of sheep

Here, we get our inspiration from Boids (Reynolds 1987), and define the following axioms for movement of sheep:

Axiom 1

Sheep move away from dogs when sufficiently close.

Axion 2

When two sheep are very close to each other, they want more space and move away.

Axiom 3

When two sheep are neither too close, nor too far, they flock together.

We translate these axioms into forces that are acting on the sheep. These forces then drive the sheep in a certain direction and define the speed at which the sheep moves. We note that others use similar approaches.

All forces are calculated using the distance between the sheep and the other object, \(z\), denoted by \(d_z\), and are of the following form: \[F_{a,b}(d_z) = b\left(1-\sqrt{\frac{d_z}{a}}\right)_+\] and where the direction of the force is given by the angle of \(z\) relative to the sheep. Note that the force function is paramaterized by \(a\) and \(b\), which have a simple interpetation: \(a\) is the maximum distance of the force, and \(b\) is the magnitude of the force when the distance is \(0\).1

  1. Most of the physical simulation, including an animation, is currently implemented, code can be found on


Given the setup as outlined above, there are several interesting problems one can investigate. In our case, we are interested in comparing two approaches to learning to herd:

  1. Dogs act independently, while they can observe each other, they cannot explicitly coordinate their movements.

  2. There is a shepherd AI that coordinates the movements of the dogs. This is a two-tiered approach, where dogs are trained to perform specific smaller tasks (move to a target location, circle the flock clockwise, etc.) and a central AI that gives the instructions to the dogs.

To investigate these two approaches, we propose to start with some simple tasks, and expand upon them. Some examples of simple tasks include assigning a target for the dog to move too, and one dog having to move one sheep to a target location. More complicated tasks can be based upon these basic tasks.