Convolutional Non-local Spatial-Temporal Learning for Multi-Modality Action Recognition