On the Dependability of Bidirectional Encoder Representations from Transformers (BERT) to Soft Errors

Zhen Gao; Rui Su; Jingyan Wang; Jie Deng; Qiang Liu; Pedro Reviriego; Shanshan Liu; Fabrizio Lombardi

doi:10.36227/techrxiv.170654743.34129663/v1

loading page

On the Dependability of Bidirectional Encoder Representations from Transformers (BERT) to Soft Errors

Zhen Gao,
Rui Su,
Jingyan Wang,
Jie Deng,
Qiang Liu,
Pedro Reviriego,
Shanshan Liu ,
Fabrizio Lombardi

Abstract

Transformers are widely used in natural language processing and computer vision, and Bidirectional Encoder Representations from Transformers (BERT) is one of the most popular pre-trained transformer models for many applications. This paper studies the dependability and impact of soft errors on BERT implemented with different floating-point formats using two case studies: sentence emotion classification and question & answering. Simulation by error injection is conducted to assess the impact of errors on different parts of the BERT model and different bits of the parameters. The analysis of the results leads to the following findings: 1) in both single and half precision, there is a Critical Bit (CB) on which errors significantly affect the performance of the model; 2) in single precision, errors on the CB may cause overflow in many cases, which leads to a fixed result regardless of the input; 3) in half precision, the errors do not cause overflow but they may still introduce a large accuracy loss. In general, the impact of errors is significantly larger in singleprecision than half-precision parameters. Error propagation analysis is also considered to further study the effects of errors on different types of parameters and reveal the mitigation effects of the activation function and the intrinsic redundancy of BERT.

27 Jan 2024Submitted to TechRxiv

29 Jan 2024Published in TechRxiv

Abstract

Peer review timeline