Beatrice Bindi
Evaluating Stanza and UDPipe for Morphosyntactic Annotation of Old Russian: A Case Study on Maximus the Greek
Оценка на строфи и UDPipe за морфосинтактична анотация на староруски език: казусът Максим Грек
-
Summary/Abstract
The automation of morphosyntactic annotation of Old Russian texts represents a key challenge in contemporary Slavistics, underscoring the need for computational tools capable of processing historical linguistic data with high accuracy. This study qualitatively evaluates the performance of two statistical taggers, Stanza and UDPipe, in annotating a text by Maximus the Greek, using the TOROT and RNC treebanks as reference corpora. The analysis assesses the accuracy of morphosyntactic annotation—specifically, part-of-speech tagging, morphological feature assignment, and lemmatisation—identifying recurring errors and structural limitations in applying these tools to historical Slavic texts. While both taggers facilitate annotation, they do not yet ensure a level of automation sufficient for fully reliable linguistic analysis. Key challenges include the misinterpretation of morphosyntactic relationships and inaccuracies in grammatical feature assignment. The comparison with their respective reference corpora highlights these issues, demonstrating the need for further refinement in automated annotation methods. This study critically examines the applicability of current NLP technologies to historical texts, emphasizing the necessity of adapting existing models.
Subject: e-Scripta