New Developments in Tagging Pre-modern Orthodox Slavic Texts
- Author(s): Achim Rabus Susanne Mocken Yves Scherrer
- Subject(s): Church Slavonic // Natural language processing // Part of speech tagging // Old Russian // Neural networks // Language studies // Language and Literature Studies // Theoretical Linguistics // Studies of Literature // Eastern Slavic Languages // Philology // Theory of Literature //
-
Published by: Institute for Literature BAS
- Print ISSN: 1312-238X
- Summary/Abstract:
Pre-modern Orthodox Slavic texts pose certain difficulties when it comes to part-of-speech and full morphological tagging. Orthographic and morphological heterogeneity makes it hard to apply resources that rely on normalized data, which is why previous attempts to train part-of-speech (POS) taggers for pre-modern Slavic often apply normalization routines. In the current paper, we further explore the normalization path; at the same time, we use the statistical CRF-tagger MarMoT and a newly developed neural network tagger that cope better with variation than previously applied rule-based or statistical taggers. Furthermore, we conduct transfer experiments to apply Modern Russian resources to pre-modern data. Our experiments show that while transfer experiments could not improve tagging performance significantly, state-of-the-art taggers reach between 90% and more than 95% tagging accuracy and thus approach the tagging accuracy of modern standard languages with rich morphology. Remarkably, these results are achieved without the need for normalization, which makes our research of practical relevance to the Paleoslavistic community.
Journal: Scripta & e-Scripta vol. 18, 2018
-
Page Range: 9-33
No. of Pages: 25
Language: English - LINK CEEOL: https://www.ceeol.com/search/article-detail?id=689984
-
Achim RabusGermanyDepartment of Slavic Linguistics, University of Freiburg, GermanyDescription
Prof. Dr. Achim Rabus is the current Head of the Department of Slavonic Studies at the University of Freiburg, Germany. Rabus defended his PhD thesis on the language of East Slavic spiritual songs in 2008 and his Habilitationsschrift on Slavic language contact in 2014. Since 2009, Rabus has been a member of the Special Commission on the Computer- Supported Processing of Mediæval Slavonic Manuscripts and Early Printed Books to the International Committee of Slavists, and since 2018, the President of the Commission. His current research focuses on Slavic social dialectology, Handwritten Text Recognition, corpus and (digital) historical linguistics.
Susanne MockenUniversity of Freiburg, GermanyYves ScherrerDepartment of Digital Humanities, University of Helsinki, Finland -
SUBJECT: Church Slavonic // Natural language processing // Part of speech tagging // Old Russian // Neural networks // Language studies // Language and Literature Studies // Theoretical Linguistics // Studies of Literature // Eastern Slavic Languages // Philology // Theory of Literature //KEYWORDS:
-