Explainable Attention Pruning: A Meta-learning-based Approach
AbstractPruning, as a technique to reduce the complexity and size of Transformer-based models, has gained significant attention in recent years. While various models have been successfully pruned, pruning BERT poses unique challenges due to their fine-grained structure and overparameterization. However, by carefully considering these factors, it is possible to prune BERT without significantly degrading its pre-trained loss. In this paper, we propose a Meta-learning-based pruning approach that can adaptively identify and eliminate insignificant attention weights. The performance of the proposed model is compared with several baseline models, as well as the default fine-tuned BERT model. The baseline pruning strategies employed low-level pruning techniques, targeting the removal of only 20% of the connections. The experimental results show that the proposed model outperforms the other baseline models, in terms of lower inference latency, higher MCC and lower loss. However, there is no significant improvement observed in terms of average FLOPs (floating-point operations per second). Furthermore, we conduct a comparative evaluation of the baseline models and our proposed model using two explainable (XAI) approaches. While other models allocate reasonable attention to less significant words for sentiment classification, our model assigns higher probabilities to the most significant sentimental words. Impact Statement-Efficient handling of inference time in pre-trained language models (PLMs) and the preservation of performance while reducing their size are important research considerations. Model compression techniques, such as pruning, are recognized as effective approaches for achieving memoryefficient, energy-efficient, computation-efficient, and storageefficient PLMs. Pruning addresses the need to create compact models without compromising their overall effectiveness. Existing pruning methods often rely on task and domain-specific approaches and therefore, it is important to explore a domainindependent pruning approach. We propose a new pruning strategy called Meta-Controller-based Attention Pruning (MCAP) for the BERT model targeting single-sentence prediction tasks. MCAP optimization strategy eliminates insignificant attention in the BERT by calculating their importance scores. The selfsupervised pruner in MCAP uses a meta-learning approach to identify and eliminate these insignificant attentions before finetuning. Our study compares MCAP with baseline models (both structured and unstructured pruning) and compared it with inference latency, MCC, and loss parameters. The results show that MCAP outperforms the baseline models in terms of inference latency, MCC, and loss. Explainable AI (XAI) techniques are used to interpret the model's decisions and predictions. MCAP focuses on significant words in sentiment classification, ensuring important model parameters are retained without a significant impact on output.