Although BERT has achieved excellent results in various natural language
processing tasks, it does not exhibit the same high performance in
cross-lingual tasks, especially machine translation tasks. We propose a
BERT enhanced neural machine translation (BE-NMT) model to improve the
use of the information that is contained in BERT by NMT. The model
consists of three aspects: (1) A MASKING strategy is applied to
alleviate the knowledge forgetting that is caused by the fine-tuning of
BERT on the NMT task.(2) Serial and parallel processing are combined for
the multi-attention models when incorporating BERT into the NMT model.
(3) The multiple hidden layer outputs of BERT are fused to supplement
the missing linguistic information of its final hidden layer output.
Experiments demonstrate that our method achieves good improvements in
various NMT tasks compared with the baseline model.