Proposal cs221 project: Sheepherding

Sheephearding is one of the oldest trades, where a herder (with help of one or multiple dogs) tries to bring sheep together at some target location. In this project, we define a simplified model of sheepherding where dogs try to move sheep to a target. We then use simulations and apply reinforcement learning to train an AI for the dogs that is capable of performing several tasks related to sheepherding. In particular, we are interested in cooperative behavior: can dogs cooperate efficiently without communication, or does a central authority that commands to the dogs much better at herding?

First, we give a brief overview of some related literature. In (Vaughan 1998), a method is proposed to gather ducklings using a robot in a simple circular area, but their approach is not based on any learning. (Cowling 2010) does discuss methods for herging sheep from an AI perspective, where they use a hierarchical and stack-based finite state machine. (Schultz 1996) proposes to use genetic algorithms to fing decision rules to model robots behavior. In a case study, one robot tries to guide another robot to a target location. An interesting perspective is given in (Potter 2001), where so-called specialists are discussed. Sometimes it is more useful for agents to specify in specific tasks (such as in playing soccer), while for other tasks this is not useful. Finally, (Kok 2005) discusses an approach where agents start by acting individually, but learn how and when to cooperate if needed. This, they argue, leads to a small state space because only necessaray interaction is taken into account.

Underlying all simulations is a physical model that describes the world, and the movement of sheep in particular.

The world is modeled as a box, \([0, x_{m}]\times[0,y_{m}]\). We then represent the sheep, dogs and target(s) as points in this space. The majority of the literature uses a grid to model the world, but we choose not to discretize the space, since that allows for more prolific movement, a more realistic environment, and a more interesting modeling perspective. Sometimes, people add other agents or obstacles to the world, but we propose to leave those out to start with.

Here, we get our inspiration from Boids (Reynolds 1987), and define the following axioms for movement of sheep:

- Axiom 1
Sheep move away from dogs when sufficiently close.

- Axion 2
When two sheep are very close to each other, they want more space and move away.

- Axiom 3
When two sheep are neither too close, nor too far, they flock together.

We translate these axioms into forces that are acting on the sheep. These forces then drive the sheep in a certain direction and define the speed at which the sheep moves. We note that others use similar approaches.

All forces are calculated using the distance between the sheep and the other object, \(z\), denoted by \(d_z\), and are of the following form: \[F_{a,b}(d_z) = b\left(1-\sqrt{\frac{d_z}{a}}\right)_+\] and where the direction of the force is given by the angle of \(z\) relative to the sheep. Note that the force function is paramaterized by \(a\) and \(b\), which have a simple interpetation: \(a\) is the maximum distance of the force, and \(b\) is the magnitude of the force when the distance is \(0\).^{1}

Most of the physical simulation, including an animation, is currently implemented, code can be found on github.com/schmit/sheepherding.↩

Given the setup as outlined above, there are several interesting problems one can investigate. In our case, we are interested in comparing two approaches to learning to herd:

Dogs act independently, while they can observe each other, they cannot explicitly coordinate their movements.

There is a shepherd AI that coordinates the movements of the dogs. This is a two-tiered approach, where dogs are trained to perform specific smaller tasks (move to a target location, circle the flock clockwise, etc.) and a central AI that gives the instructions to the dogs.

To investigate these two approaches, we propose to start with some simple tasks, and expand upon them. Some examples of simple tasks include assigning a target for the dog to move too, and one dog having to move one sheep to a target location. More complicated tasks can be based upon these basic tasks.

Although the above setup leads to a fully deterministic simulation, we want to attack the problems using reinforcement learning algorithms. The state space, as is, is infinite, and the physics are complicated. Especially with multiple sheep, modeling the above as an MDP is problematic, as the new state is difficult to compute given an action.

Instead, we prefer to featurize the state space and work with these approximations. Note the difference between this approach and methods where features are used to generalize effects of transitions. In some way, all we have is the featurized state.

Rather than comparing a proposed method to a baseline and an oracle, we propose to investigate two models as explained above, and compare their performance. This has two reasons. First, we think comparing these two methods is interesting: are agents able to cooperate without communication simply because rewards are aligned, or does a central ‘commander’ improve efficiently dramatically? Second, it is non-trivial to come up with baseline and oracle methods for these tasks.

There are two specific metrics we can use to determine performance

Time till task completion,

Fraction of success in \(T\) seconds.

Besides using these quantitative measures, the animations produced can give us a very clear picture of what is going on and whether the actions of dogs make sense. Using both a quantitative and a qualitative approach to evaluation, we think it is possible to accurately judge the success of the learning algorithms.

Peter I Cowling, Christian Gmeinwieser. AI for Herding Sheep.. In

*AIIDE*. (2010).Jelle R. Kok, Pieter Jan, Hoen Bram, Bakker Nikos Vlassis. Utile coordination: Learning interdependencies among cooperative agents. 29–36 In

*In Proceedings of the IEEE Symposium on Computational Intelligence and Games (CIG’05*. (2005).Mitchell A Potter, Lisa A Meeden, Alan C Schultz. Heterogeneity in the coevolved behaviors of mobile robots: The emergence of specialists.

**17**, 1337–1343 In*International joint conference on artificial intelligence*. (2001).Craig W. Reynolds. Flocks, Herds and Schools: A Distributed Behavioral Model.

*SIGGRAPH Comput. Graph.***21**, 25–34 ACM, 1987. LinkAlan Schultz, John J. Grefenstette, William Adams. Robo-Shepherd: Learning Complex Robotic Behaviors. 763–768 In

*In Robotics and Manufacturing: Recent Trends in Research and Applications, Volume 6*. ASME Press, 1996.Richard Vaughan, Neil Sumpter, Andy Frost, Stephen Cameron. Robot Sheepdog Project achieves automatic flock control. 489–493 In

*In From Animals to Animats 5: Proceedings of the Fifth International Conference on the Simulation of Adaptive Behaviour*. MIT Press, 1998.