loading page

Enhancing use of BERT information in neural machine translation
  • Xi Chen,
  • Yuanhao Zhang
Xi Chen
Kunming University of Science and Technology

Corresponding Author:[email protected]

Author Profile
Yuanhao Zhang
Beijing Tiandi Hexing Technology Co Ltd
Author Profile


Although BERT has achieved excellent results in various natural language processing tasks, it does not exhibit the same high performance in cross-lingual tasks, especially machine translation tasks. We propose a BERT enhanced neural machine translation (BE-NMT) model to improve the use of the information that is contained in BERT by NMT. The model consists of three aspects: (1) A MASKING strategy is applied to alleviate the knowledge forgetting that is caused by the fine-tuning of BERT on the NMT task.(2) Serial and parallel processing are combined for the multi-attention models when incorporating BERT into the NMT model. (3) The multiple hidden layer outputs of BERT are fused to supplement the missing linguistic information of its final hidden layer output. Experiments demonstrate that our method achieves good improvements in various NMT tasks compared with the baseline model.