e-Scripta

The 48th Austrian Linguistics Conference (48. Österreichische Linguistiktagung), panel ‘Digitale Slawistik’ (‘Digital Slavic Studies’)

48-ма Австрийска лингвистична конференция (48. Österreichische Linguistiktagung), панел „Digitale Slawistik“ („Цифрови славистични изследвания“)

  • Summary/Abstract

    The integration of digital technologies has become increasingly important across academic disciplines in the 21st century, rapidly transforming research possibilities and methodologies in the humanities and social sciences. Since the emergence of transformer neural networks like BERT (Bidirectional Encoder Representations from Transformers) and the subsequent rise of Large Language Models, linguistic research has experienced yet another wave of profound changes, especially in the field of Natural Language Processing (NLP). Despite these developments, implementing digital technologies remains a desideratum in Slavic linguistics, particularly when working with low-resource historical varieties. The panel “Digitale Slawistik” (‘Digital Slavic Studies’), held as a part of the 48th Austrian Linguistics Conference (48. Österreichische Linguistiktagung) at the University of Innsbruck on December 18–19, 2024, aimed to address this gap by bringing together linguists who study Slavic languages and incorporate digital methods into their research practice. Over the two days, scholars from Germany, Austria, Italy, and the Czech Republic presented their research. The event was organised by Elias Bounatirou (University of Vienna), Anna Jouravel (University of Freiburg), Maximilian Grübsch (University of Vienna), and Ilia Afanasev (University of Vienna).

    Subject: e-Scripta

Computer-assisted Study of Historical Lemkian (Transcarpathian) Lects: Basic Vocabulary Approach Scripta & e-Scripta vol. 25, 2025 floyd Tue, 08/19/2025 - 17:28
Компютърно подпомагано изследване на исторически лемкийски (закарпатски) диалекти: подход към основния речник

This research presents the first step in digitising texts of historical Lemkian (Transcarpathian) dialects, recorded in 1930s, and transforming them into an open- access dataset. The developed dataset includes morphological tagging, lemmatisation, and data on the named entities and basic vocabulary items. This allows for the evaluation of pre- existing models for automatic tagging of basic vocabulary in Slavic on the new material quantitatively (checking their efficiency), qualitatively (going example-by-example), and formally (by analysing the research design of previous studies). The present pilot study shows that existing models are not able to efficiently detect enough Automatic Similarity Judgement Program (ASJP) basic vocabulary list items in the Lemkian texts (F1-score less than 0.5), finding only the words that formally completely coincide with their cognates in other Slavic languages (personal pronouns). The bar chart-based visualisation shows that the previously hypothesized formalisation of basic vocabulary items as similar in distribution to the named entities is incorrect, and a new formalisation is required. The main contribution of the work is an open-access dataset of historical Lemkian dialects.

Subject: e-Scripta Keywords: Lemkian Transcarpathian dialectology computer-assisted study basic vocabulary
BelarusianGLUE: Analyzing Performance of Open-weight Models Scripta & e-Scripta vol. 25, 2025 floyd Tue, 08/19/2025 - 17:18
BelarusianGLUE: анализ на продуктивността на модели от отворен тип

We use BelarusianGLUE, a recently introduced benchmark, to analyze the performance of open-weight large language models (LLMs) on Belarusian language understanding tasks. The impact of prompting language, few-shot prompts, orthography (modern/classical/Latin), chat templates, and evaluation mode (discriminative/ generative) is investigated. Our findings suggest that more recent models generally perform better, but improvements are gradual. Fine-tuning on related Slavic languages doesn’t always improve Belarusian understanding. Classical orthography has limited impact, while latinization degrades performance. Analysis of specific tasks (sentiment analysis, Winograd schema challenge) reveals biases in the models, difficulties with understanding linguistic structure, and gaps in world knowledge and cultural context.

Subject: e-Scripta Keywords: natural language processing Belarusian language large language models language understanding evaluation
Evaluating Stanza and UDPipe for Morphosyntactic Annotation of Old Russian: A Case Study on Maximus the Greek Scripta & e-Scripta vol. 25, 2025 floyd Tue, 08/19/2025 - 17:15
Оценка на строфи и UDPipe за морфосинтактична анотация на староруски език: казусът Максим Грек

The automation of morphosyntactic annotation of Old Russian texts represents a key challenge in contemporary Slavistics, underscoring the need for computational tools capable of processing historical linguistic data with high accuracy. This study qualitatively evaluates the performance of two statistical taggers, Stanza and UDPipe, in annotating a text by Maximus the Greek, using the TOROT and RNC treebanks as reference corpora. The analysis assesses the accuracy of morphosyntactic annotation—specifically, part-of-speech tagging, morphological feature assignment, and lemmatisation—identifying recurring errors and structural limitations in applying these tools to historical Slavic texts. While both taggers facilitate annotation, they do not yet ensure a level of automation sufficient for fully reliable linguistic analysis. Key challenges include the misinterpretation of morphosyntactic relationships and inaccuracies in grammatical feature assignment. The comparison with their respective reference corpora highlights these issues, demonstrating the need for further refinement in automated annotation methods. This study critically examines the applicability of current NLP technologies to historical texts, emphasizing the necessity of adapting existing models.

Subject: e-Scripta Keywords: Stanza UDPipe natural language processing Morphosyntactic analysis Annotation Old Russian Maximus the Greek

The Concept ‘New Ethics’ in Russian Media Discourse: A CorpusBased Analysis

Концепцията „Нова етика“ в руския медиен дискурс: корпусен анализ

  • Summary/Abstract

    This paper analyzes the concept of novaja ėtika (‘new ethics’) as represented in Russian media discourse, focusing on a comparison between two prominent Russian media sources: Lenta.ru and Meduza.io. The study is based on a corpus of 86 texts published between 2019 and 2024. Using corpus-based methods – including frequency analysis, topic modeling, and named entity recognition (NER) – the study identifies distinct differences in how the term novaja ėtika is conceptualized, discussed, and positioned in public debates. Lenta.ru is one of Russia’s largest mainstream news sites, known for its pro-government neutrality and wide readership across the country. Meduza.io, on the other hand, is an exile-founded outlet that is often critical of Russian state policy and targets a younger, urban, liberal audience. Lenta.ru frames novaja ėtika predominantly negatively, associating it with external pressures, cultural conflicts, and moral censorship. In contrast, Meduza.io approaches the concept analytically, emphasizing its philosophical foundations, discursive development, and socio-cultural implications. The analysis also highlights differences in the representation of actors, revealing that Lenta.ru focuses on geopolitical actors and institutional structures, while Meduza.io also prioritizes individual commentators and cultural influencers. The results illustrate broader discursive strategies and cultural cleavages in contemporary Russian media that reflect competing visions of social norms, public morality, and identity politics. As an exploratory study, it is subject to methodological limitations, including the size and scope of the corpus.

    Subject: e-Scripta

Effektiver Einsatz von NLP-Methoden am Beispiel des Codex Suprasliensis Scripta & e-Scripta vol. 25, 2025 floyd Tue, 08/19/2025 - 17:07
Ефективно използване на методите на NLP въз основа на пример от Codex Suprasliensis

The integration of computational methods in historical philology is becoming increasingly essential, yet challenges persist in harmonizing linguistic and technical aspects of text analysis. This study presents a comprehensive and methodologically transparent use case that documents the entire computational philological workflow– from data acquisition and modeling to analysis and visualization–in a structured and reproducible manner. Using the Codex Suprasliensis, one of the most significant Old Slavic manuscripts, as a case study, we demonstrate how modern Natural Language Processing (NLP) techniques, particularly the Stanza library for morphosyntactic annotation and DataFrame-based corpus structuring, can facilitate the exploration of historical textual corpora. A special emphasis is placed on benchmarking Stanza’s performance in processing Old Church Slavonic, evaluating its segmentation, tagging, and parsing accuracy against existing Gold Standard datasets. Additionally, we discuss the role of DataFrame-based modeling in ensuring an efficient and transparent structuring of linguistic data, allowing for flexible transformations and reproducible analyses. To support further research and methodological validation, all functional and extensively annotated scripts–including the complete NLP pipeline–are permanently provided via the GitHub platform of the Berlin State Library. The findings highlight the importance of structured corpus processing in computational philology and contribute to the ongoing refinement of NLP methodologies for historical languages.

Subject: e-Scripta Keywords: Computational Philology natural language processing Old Church Slavonic Stanza and Corpus Annotation DataFrame-Based Text Structuring

Die thematisch-fokussierende komplexe Präposition v lice. Eine diachrone Analyse mithilfe von BERT

Тематично фокусираният сложен предлог „в лице“. Диахронен анализ с помощта на BERT

  • Summary/Abstract

    This article examines complex prepositions in Russian using the construction v lice as a case study. This denominal complex preposition, consisting of the primary preposition v and a noun, exemplifies the dynamic transitional processes between word classes. The central focus of the analysis is the question of which semantic properties are associated with the degree of establishment of such constructions. The article adopts both a synchronic and diachronic perspective, with particular attention to developments since the 19th century, during which complex prepositions increasingly entered scientific, technical, and journalistic writing styles. Using corpus-based methods and embedding- based techniques (BERT), the study reconstructs semantic shifts and identifies functional- semantic changes. In doing so, it contributes to the description of the internal dynamics of complex prepositions in Russian.

    Subject: e-Scripta

Digital Edition of Pop Punčov Sbornik: Project Note Scripta & e-Scripta vol. 24, 2024 floyd Thu, 10/03/2024 - 16:35

The described resource is an online tool, designed for studying texts and diachronic variation of language. The core of its corpus is represented by the Pop Punčov Sbornik, a West Bulgarian manuscript from 1796, released together with smaller examples of 14th–19th century Balkan Slavic varieties. Aside from the data, providing a unique view of historical dialects, it also provides a user- friendly interface and modular structure, thus allowing both easy additions of new content and features, as well as training of students and lay people interested in historical literature. The resource also contains extensive documentation concerning both grammar and philological data about the sources.

Subject: e-Scripta Keywords: Balkan Slavic diachronic corpus damaskini CHURCH SLAVONIC

Cтатистическая значимость компонентов лексических синонимических рядов в древнеболгарских письменных памятниках: поиск метода

Statistical significance of the components of lexical synonymous series in ancient Bulgarian written manuscripts: search for a method

  • Summary/Abstract

    The results of statistical experiments to find the characteristics of words that are traditionally considered as the Ohrid-Moravian and Preslav components of synonymous series – иерѣи ‘priest’ – жьрьць ‘priest, cleric’ – свѧщеньникъ ‘priest, clergyman’, колѣ- но ‘knee’, ‘kindred’ – племѧ ‘tribe, genus’, коньчина ‘demise, end’ – коньць ‘end’, кънигы ‘books’ – писаниѥ ‘scripture’, любодѣица ‘adulteress, fornicator’ – блѫдьница ‘harlot’ are presented. The use of information about the relative number of words in a subcorpus, about significant deviations from the average values, and the calculation of statistical characteristics of lexemes in each of the subcorpora made it possible, in particular, to detect opposed and non-opposed components of synonymous series. The methods used to identify the statistical characteristics of words have shown that the degree of opposition of synonyms can be different – statistically significant or statistically insignificant. On this basis, it is concluded that it is necessary to move away from the unconditional attribution of the components of the synonymic series to the Ohrid-Moravian and Preslav vocabulary: the relations between the components of each synonymic series are individual and can range from statistically opposed in the texts of different schools to

    Subject: e-Scripta

Serbian Early Printed Books from Venice: Creating Models for Automatic Text Recognition Using Transkribus Scripta & e-Scripta vol. 22, 2022 floyd Wed, 08/17/2022 - 08:39
Владимир Р. Поломац. Сръбски старопечатни книги от Венеция: cъздаване на модели за автоматично текстово разпознаване чрез Transkribus

The paper describes the process of creating a model for the automatic rec- ognition of Serbian Church Slavonic printed books from Venice (from Božidar and Vincenzo Vuković’s printery) by using the Transkribus software platform, based on the principles of artificial intelligence and machine learning. By using the example of Prayer Book (Euchologion) (1538–1540) from Božidar Vuković’s printery, it has been shown that a successful model for the automatic recognition of individual books (with around 5% of unrecognized characters) can also be trained on the material consisting of approximately 4000 words, and that the increased amount of training material (in our case around 38000 words) leads to the improvement of the model and reduced error rate (between 1–2% of unrecognized characters). The most notable result of the paper is manifested through the creation of a generic model for the automatic text recognition of Serbian Church Slavonic books from Božidar and Vincenzo Vuković’s printery. The ini- tial version of the generic model (called Dionisio 1.0. by the Božidar Vuković’s Italian pseudonym – Dionisio della Vecchia) is the first resource for the automatic recognition of the Serbian medieval Cyrillic script, publicly available to all users of the Transkribus software platform (see https://readcoop.eu/model/dionisio-1-0/).

Subject: e-Scripta Digital humanities Keywords: TRANSKRIBUS Automatic Text Recognition Serbian Early Printed Books Ar- tificial Intelligence MACHINE LEARNING Venice
The Bamberg Cyrillic Alphabet – a Colour Facsimile Scripta & e-Scripta vol. 22, 2022 floyd Wed, 08/17/2022 - 08:37
Себастиан Кемпген. Кирилската азбука от Бамберг – цветно факсимиле

The so-called Bamberg Cyrillic Alphabet (ca. 13th c.) is one of the oldest and most reliable xenographic Slavic alphabets, i.e. a Cyrillic alphabet added to a Latin manuscript of non-related content. It has been published and edited before in black- and-white, and it is presented here for the first time in a high-quality colour photograph, accompanied by a slightly revised tabular reprentation.

Subject: e-Scripta Digital humanities Keywords: Cyrillic Script Historic Alphabets Facsimile

Parkosz’s Treatise from a Typographic Point of View

Януш С. Биен. Трактатът на Паркош от типографска гледна точка

  • Summary/Abstract

    The 15th century Latin manuscript containing a treatise by Parkosz was the very first proposal of Polish spelling. To account for all the phonemes of Polish some new letters were proposed, which are not available in the present day fonts. This makes difficult to quote the proposal when discussing the history of Polish spelling. The paper describes the transliteration proposed by the author which used the characters available in the Unicode standard. The ultimate solution is of course creating a specialized font and the paper mentions some aspects of this task.


Conference “Book and Script. Tradition and Modernity,” April, 8-9, 2022, Sofia Scripta & e-Scripta vol. 22, 2022 floyd Wed, 08/17/2022 - 08:32
Стефан Пеев. Конференция „Книга и шрифт. Традиция и съвременност”, 8-9 април 2022 г., София

The article provides a thorough review of the presented authors and reports during the scientific conference “Book and Script. Tradition and Modernity”. The scientific conference held on the 8th and 9th of April, 2022, is the first attempt of its kind for an interdisciplinary approach to examine the development of books and scripts from a historical and theoretical aspect. Twenty four papers were presented during the conference from the following institutions: Sofia University “St. Kliment Ohridski”, National Academy of Arts, Plovdiv University “Paisii Hilendarski”, Southwestern University “Neofit Rilski”, UniBit (University of Library Science and Information Technology), New Bulgarian University, Institute of Bulgarian Language at the Bulgarian Academy of Sciences, National Library “St. Cyril and Methodius”, Regional Library “P. Pavlovich” (Silistra), University of Zurich, Typeflow (Rijeka). The main topics during the conference were the script origin among Bulgarians; the early Slavic printed books; scripts and manuscripts; the Revival book and its characteristics; books and fonts in modern times; libraries, books and modern approaches in describing them. There was a general consensus that the interdisciplinary approach opens up new fields and horizons for research in the field of books and scripts.

Subject: e-Scripta Digital humanities Keywords: book printed book Revival book manuscript manuscript book Cyrillic Script Glagolitic script script font print typography paper illustrations
Subscribe to e-Scripta