natural language processing

Wege zur verbesserten automatischen Annotation des mittelbulgarischen Kirchenslawischen

Фабио Майо. Начини за подобряване на автоматичните анотации на средно­ български църковнославянски текстове

  • Summary/Abstract

    The last decade has brought an upswing in research on natural language processing. However, it is well known that historical language stages are largely underrepresented. Middle Bulgarian Church Slavonic, a language variety with a significant literary productivity, is a prime example. In the current paper, it is shown how annotated texts of related language varieties can be used to annotate texts written in Middle Bulgarian Church Slavonic, such as the 14th-century translation of the Dioptra. In particular, I present a way of adapting the available training data and of reducing the differences between training and test data, thereby improving the result of the automatic morphological annotation. Moreover, it is demonstrated that a comparison with the original work, written in Byzantine Greek, can further increase the results of the annotation by carefully disambiguating homonymous word forms. The presented results can benefit research on Middle Bulgarian Church Slavonic as it shows how texts in this variety can be annotated without authentic training data. The proposed method may be of use not only for Slavonic Studies, however. The method of using training data from genetically related language varieties in combination with translations may be used to annotate other underrepresented language varieties as well.


From Annotation to Modeling: Computational Horizons for Medieval Slavic Studies.

От анотиране към моделиране: компютърни хоризонти за славистичната медиевистика

  • Summary/Abstract

    This paper is a write-up of a keynote from El’Manuscript 2021, reflecting on the ways in which the field of computationally-supported medieval Slavic studies has and has not changed since the mid-2000’s. Looking towards developments in the broader fields of digital humanities and natural-language processing, it explores the ways that recent improvements in the tools at our disposal for mass digitization of manuscripts and text analysis at scale open up possibilities for working with manuscripts that have received very little attention. For these advancements to be feasible, however, scholars will need to prepare and share their digitized texts and annotations in ways that are not currently the norm, though a number of projects provide exemplary models of how these new conventions could be put into practice.


Subscribe to natural language processing