Loss (top) and global norm of the gradients (bottom) during training. The learning rate is lowered every 1000 epochs. There is quite a correlation between the spikes in the two plots.