1. The description and analysis of animal behaviour over long periods of time is one of the most important challenges in ecology. However, most of these studies are limited due to the time and cost required by human observers. The collection of data via video recordings allows observation periods to be extended. However, their evaluation by human observers is very time-consuming. Progress in automated evaluation, using suitable deep learning methods, seems to be a forwardlooking approach to analyse even large amounts of video data in an adequate time frame. 2. In this study we present amulti-step convolutional neural network system for detecting animal behaviour states, which works with high accuracy. An important aspect of our approach is the introduction of model averaging and post-processing rules to make the system robust to outliers. 3. Our trained system achieves an in-domain classification accuracy of >0.92, which is improved to >0.96 by a postprocessing step. In addition, the whole system performs even well in an out-of-domain classification task with two unknown types, achieving an average accuracy of 0.93. We provide our system at https://github.com/Klimroth/Video-Action-Classifier-for-African-Ungulates-in-Zoos/tree/main/mrcnn_based so that interested users can train their own models to classify images and conduct behavioural studies of wildlife. 4. The use of a multi-step convolutional neural network for fast and accurate classification of wildlife behaviour facilitates the evaluation of large amounts of image data in ecological studies and reduces the effort of manual analysis of images to a high degree. Our system also shows that post-processing rules are a suitable way to make species-specific adjustments and substantially increase the accuracy of the description of single behavioural phases (number, duration). The results in the out-of-domain classification strongly suggest that our system is robust and achieves a high degree of accuracy even for new species, so that other settings (e.g. field studies) can be considered.