loading page

Mac-Morpho Revisited: Towards Robust Part-of-Speech Tagging
  • Erick
Erick

Corresponding Author:[email protected]

Author Profile

Abstract

We present a revision of Mac-Morpho, the biggest corpus of Portuguese text containing manually annotated POS tags. Many errors were corrected, yielding a much more reliable resource. We also trained a neural network based classifier for the POS tagging task, following an architecture that achieves state-of-the-art results in English. Our tagger maps each word to a real valued vector and uses it as input, thus dealing with abstract features. These vectors are induced by distributional semantics techniques, and provide the tagger with information for achieving 96.48% accuracy.