Authorea

Sven Schmit edited Models.tex over 9 years ago

Commit id: 35e4a046a88808f27d4c7304f15618bb23126190

deletions | additions

Although the above setup leads to a fully deterministic simulation, we want to attack the problems using reinforcement learning algorithms. The state space, as is, is infinite, and the physics are complicated. Therefore, it is important to simplify things instead of Especially with multiple sheep, modeling the world directly. above as an MDP is problematic, as the new state is difficult to compute given an action. Learning a dog how Instead, we prefer to reach a target is trivial, and even moving one sheep to a targes should not be too difficult. Things get really interested when there are multiple sheep and dogs, and when dogs try to cooperate. We propose to, at featurize the very least, try to use two dogs state space andsee whether they can work together without communication. Hence, both dogs are controlled by their own AI, with these approximations. Note the difference between this approach and they take their own decisions. However, they methods where features are aware used to generalize effects of transitions. In some way, all we have is the other dog's position and can use that to take actions. featurized state.