Susanne Mocken | Scripta & e-Scripta

University of Freiburg, Germany

New Developments in Tagging Pre-modern Orthodox Slavic Texts Scripta & e-Scripta vol. 18, 2018 floyd Fri, 12/28/2018 - 08:08 Achim Rabus Susanne Mocken Yves Scherrer

Pre-modern Orthodox Slavic texts pose certain difficulties when it comes to part-of-speech and full morphological tagging. Orthographic and morphological heterogeneity makes it hard to apply resources that rely on normalized data, which is why previous attempts to train part-of-speech (POS) taggers for pre-modern Slavic often apply normalization routines. In the current paper, we further explore the normalization path; at the same time, we use the statistical CRF-tagger MarMoT and a newly developed neural network tagger that cope better with variation than previously applied rule-based or statistical taggers. Furthermore, we conduct transfer experiments to apply Modern Russian resources to pre-modern data. Our experiments show that while transfer experiments could not improve tagging performance significantly, state-of-the-art taggers reach between 90% and more than 95% tagging accuracy and thus approach the tagging accuracy of modern standard languages with rich morphology. Remarkably, these results are achieved without the need for normalization, which makes our research of practical relevance to the Paleoslavistic community.

Subject: Church Slavonic Natural language processing Part of speech tagging Old Russian Neural networks Language studies Language and Literature Studies Theoretical Linguistics Studies of Literature Eastern Slavic Languages Philology Theory of Literature

Subscribe to Susanne Mocken