loading page

Zero-Shot Transfer Learning using Affix and Correlated Cross-Lingual Embeddings.
  • Abiodun Modupe,
  • Thapelo Sindane,
  • Vukosi Marivate
Abiodun Modupe
University of Pretoria Faculty of Engineering Built Environment and IT
Author Profile
Thapelo Sindane
University of Pretoria Faculty of Engineering Built Environment and IT

Corresponding Author:[email protected]

Author Profile
Vukosi Marivate
University of Pretoria Faculty of Engineering Built Environment and IT
Author Profile

Abstract

Learning morphologically supplemented embedding spaces using cross-lingual models has become an active area of research and facilitated many research breakthroughs in various applications such as machine translation, named entity recognition, document classification, and natural language inference. However, the field has not become customary for Southern African low-resourced languages. In this paper, we present, evaluate and benchmark a cohort of cross-lingual embeddings for the English-Southern African languages on two classification tasks: News Headlines Classification (NHC) and Named Entity Recognition (NER). Our methodology considers four agglutinative languages from the eleven official South African languages: Isixhosa, Sepedi, Sesotho, and Setswana. Canonical correlation analyses and VecMap are the two cross-lingual alignment strategies adopted for this study. Monolingual embeddings used in this work are Glove (source), and FastText (source and target) embeddings. Our results indicate that with enough comparable corpora, we can develop strong inter-joined representations between English and the considered Southern African languages. More specifically, the best zero-shot transfer results on the available Setswana NHC dataset were achieved using canonically correlated embeddings with Multi-layered perceptron as the training model (54.5% accuracy). Furthermore, our NER best performance was achieved using canonically correlated cross-lingual embeddings with Conditional Random Fields as the training model (96.4% F1 score). Collectively, this study’s results were competitive with the benchmarks of the explored NHC and NER datasets, on both zero-short NHC and NER tasks with our advantage being the use of very minimal resources.