Even though the MAP numbers appear more realistic, the overall performance is rather poor. This could be due to augmentation, since simple methods like changing the brightness or blurring were used. Therefore, it is very likely that the augmented samples do not add much to the overall performance. To verify this suspicion, an additional comparison between the augmented and the non augmented model could be performed in future projects. Further, the result for the 50% subsample is unexplainably high, leading to the assumption of some kind of configuration error.
Since the impact of augmented data cannot be assessed and overall performance seemed rather poor, another configuration was taken into consideration. For this setup not the dataset was exchanged but the pre-trained classifier. Instead of Resnet pre-trained on Coco, Resnet 50 pre-trained on ImageNet was used as the initial checkpoint. The model was re-trained on the Kitti dataset for 200.000 steps on the two classes 'car' and 'pedestrian' with the random horizontal flip option used for data augmentation via the object detection API. The models were tested on 1495 samples withhold from the original training data.