Applied Linguistics

Ефективност на генерични модели HTR за историческа кирилица и глаголица: Сравнение на средства Scripta & e-Scripta vol. 23, 2023 floyd Sun, 12/03/2023 - 16:14
Performance of Generic HTR Models on Historical Cyrillic and Glagolitic: Comparison of Engines

The present study offers a comparative evaluation of the performance of different AI-based digital tools for handwritten text recognition (HTR) on historical manuscripts and prints. The focus is on generic models capable of transcribing a range of texts in a similar script. The training dataset for these comprises Old Cyrillic ustav and poluustav manuscripts, on the one hand, and early Glagolitic printed books, on the other. We give an overview of the performance statistics for the HTR platforms Transkribus and eScriptorium as well as for the command-line tool Calamari. In each case, we additionally offer a close, qualitative analysis of select examples in order to convey a sense of the models’ real-world performance. In this way, our study supplies comparative data on the respective capabilities of these technologies that ought to be of interest to scholars working with them in digital humanities projects.

Subject: Language studies Language and Literature Studies Theoretical Linguistics Applied Linguistics Historical Linguistics Computational linguistics South Slavic Languages Philology Translation Studies Keywords: handwritten text recognition TRANSKRIBUS MACHINE LEARNING Cyrillic palaeography Glagolitic printings

Способы демонстрции данных славянского исторического полнотекстового корпуса “Манускрипт”

Data Demonstration Techniques in Slavonic Historical Text Corpus “Manuscript”

  • Summary/Abstract

    The article discusses theoretical and practical issues of creating tools for demonstrating medieval Slavonic text corpus at the “Manuscript” website (http:// manuscripts.ru/). The specific features of the historical corpus and its sources are: the limited number of manuscripts, variability of medieval graphics and orthography, complex structure, and composition of original documents. They require special instruments and techniques for data preparation (information about a text and its physical media, analytical tagging of fragments, variability, and other), and visualization of data sampling, including texts. The article focuses on the ways of solving two opposite tasks: the texts’ demonstration in a form as close as possible to the original and their simplified form, and, consequently, the possibilities of their transformation. The first task should be solved by preparing a transcription via a specialized editing tool, which interacts with the full-text database with a complete set of required characters, text formatting, and make-up to fit the original page. To solve the second problem, analytical tagging (chapters and verses, authors of texts, structure of manuscript, main text and marginalia, and so forth) and linguistic tagging (including lemmatization) are performed to make data search and data transformation available when displayed. The latter allows users to see a text in modern Cyrillic or Latin, syllables, meaning of analytical fragments, links between the main text and its marginalia, and so forth. The ability to data search based on deep tagging and the digital edition (LIM, MS 37, 13th c., 291 f.) which has been included in the “Manuscript” historical corpus (http://manuscripts.ru/mns/main?P_TEXT=94065041&p_lang=EN).


Текущ българо-австрийски проект за проучване на ръкопис Зограф 105 и неговия фрагмент № 3070 Scripta & e-Scripta vol. 23, 2023 floyd Sun, 12/03/2023 - 13:56
The current Bulgarian-Austrian project on the investigations of the manuscript Zograf No. 105 and its fragment No. 3070

The current Bulgarian-Austrian project on the investigations of the manuscript Zograf No. 105 and its fragment No. 3070

Subject: Language studies Language and Literature Studies Theoretical Linguistics Applied Linguistics Historical Linguistics Philology Translation Studies Keywords: Translation Studies

Recognizing Handwritten Text in Slavic Manuscripts: a Neural-Network Approach Using Transkribus


Создание и использование исторических корпусов славянских письменных памятников

Creation and Using of Historical Corpora of Slavonic Manuscripts

  • Summary/Abstract
    The requirements for historical corpora of medieval texts 1) are determined by properties of the data and the historical-linguistic, textological and linguo-textological tasks to be solved; 2) and should be realized with the help of special tagging, processing procedures, query parameters and retrieval demonstrations. The corpus should a) have metadata concerning both texts and manuscripts, and involving both linguistic and analytical tagging; b) support the rendering of documents (facsimile and transcription), concordances, lists, and comparison of subcorpora data; c) simplify graphic-orthographic variation during data search and visualization; d) provide tools both for processing and searching linguistic material and its further analysis according to traditional methods; and e) support problem description and resolution by applying corpus methods that engage with the quantity, distribution, co-occurrence, and variation of linguistic units in big data arrays. The realization of these requirements is demonstrated on a subcorpus of three copies of chronicles (Laurentian, Hypatian, Radzivilovsky) from the historical corpus project “Manuscript” (manuscripts.ru).

Electronic Edition and Linguistic Annotation of Slavic Fragments


Терминология в палеославистике и создание сети между существующими цифровыми корпусами

Terminology in Palaeoslavistics and Set up Networking between Existing Digital Corpora


Text or Paratext? The Synopsis Apostolorum of Dorotheus of Tyre

  • Summary/Abstract

    The article discusses the Synopsis Apostolorum attributed to Dorotheus of Tyre, which purports to be a list of the Seventy Apostles. It gives a brief overview of the history of the text in Greek and Slavonic. In contrast to the Greek tradition, where it may be found in miscellanies of various types and also in manuscripts of the Apostolos which are provided with relatively extensive apparatus, in Slavonic it is found exclusively in Apostolos manuscripts. The redaction of the Synopsis, moreover, corresponds to the redaction of the Apostolos; there are discernible differences between the texts in each of the three Slavonic redactions in which it is represented. This indicates that it was translated as part of the accompanying text each time that the Apostolos itself was translated. This means that the Slavonic version (unlike the Greek) exists exclusively as paratext, but that this paratextual status, being dependent on the version, is not intrinsic to the work but a function of its history. This in turn points to the necessity of taking the paratext into account in any study of the text of the Bible.


Translating the Subtleties. The Philosophical Categories in the Symeon Collection (Symeon’s Miscellany

  • Summary/Abstract

    When we study translations from classical or Byzantine Greek into Old Bulgarian, we usually encounter two aspects of the question how: firstly, the how of the linguistic rendering, the how of the translation techniques used for one term or another. The second aspect is that of evaluating the how: shall we praise or, on the contrary, express regrets in respect of the translator’s work. Besides these two inherent aspects of the question how, a third one has arisen in the last three decades in Bulgaria. We have a long tradition of translating Old Greek and Byzantine texts into Old Bulgarian, but with respect to the philosophical and theological terminology used nowadays, are we obliged to follow the patterns of the past, the forms of the language, suggested by this millennium long tradition? With respect to the formation of the Bulgarian philosophical and theological language, the Symeon’s Miscellany is an extremely important source because from f. 222 to f. 237 a range of issues is discussed as answers to questions 29 and 30. This section of the writing includes clarification of terms, categories and concepts from the classical Greek and/or the Christian philosophy and demands profound interdisciplinary research.


Указатели названий и инципитов разделов и глав славянских списков Откровения Иоанна Богослова как поисковый аппарат и основа для изучения редакций текста 

Indexes of Names and Incipita of Sections and Chapters of the Slavonic Witnesses of the Revelation of John the Theologian as a Search Tool and a Basis for Studying Versions of the Text


Иерархическая модель гимнографической терминологии: дигитальное приложение Scripta & e-Scripta vol. 18, 2018 floyd Fri, 12/28/2018 - 07:33
A Hierarchical Model of the Hymnographic Terminology: Digital Application

The article provides metadata on the descriptions of hymnographic terms exposed on the Scripta Bulgarica electronic portal (http://www.scripta-bulgarica.eu). The purpose of the analysis is to take the first step towards building a detailed digital ontology of all hymnographic terminology for the needs of specialized web products. All the hymnographic concepts included in the portal are arranged in a hierarchical system which comprises as well some of the most closely related concepts of other kinds (generally liturgical and literary). The selection of terms involves several of the basic musical and musicological hymnographic concepts too, but focuses on the verbal side of the chants. The proposed hierarchical model is based on two types of relationships: category/subcategory and whole/part, the second of which ranks the hymnographic terminology in five levels from the highest (Books) to the lowest (Genre components).

Subject: Hymnography Terminology Classification Hierarchy Category-subcategory Whole-part Scripta Bulgarica Language and Literature Studies Theoretical Linguistics Applied Linguistics Studies of Literature Philology Translation Studies

Название и самоназвание в номинативных комплексах рукописных книг XIV–XIX веков (на материале Отдела рукописей Российской государственной библиотеки)

Title and Self-title in Nominative Complexes of Manuscripts 14th–19th Centuries (Based on the Material of the Manuscript Department of the Russian State Library)


Проложные жития в средневековой южнославянской книжности Scripta & e-Scripta vol. 18, 2018 floyd Thu, 12/27/2018 - 07:21
The Prologue Vitae in the Medieval South-Slavonic Literature

The article deals with the review of prologue vitae (sinaxariа), which are short hagiographic (narrative) texts for saints and festivals, which are contained in two calendar liturgical books – Prologue and Menaia. They are founded in the composition of two types of hagiographic sinaxaria, in the Slavic literature called Simple and Verse Prologue. The first is distinguished by the verses (the two- or the three-verses), written before the sinaxaria vitae. A comparison is made between the prologue vitae from the Simple and the Verse Prologue of three popular Christian saints – George, Demetrius, Nicholas. To them are added the vitae from the Simple and the Verse Prologue for St Petka (Parasceve of Epibatae), St Ivan Rilski (John of Rila) and St Simeon and Sava (of Serbia) – emblematic South Slavic saints.

Subject: Language studies Language and Literature Studies Theoretical Linguistics Applied Linguistics Studies of Literature Philology Translation Studies Prologue Vitae Simple Prologue Prologue in Verse Menaion Commemoration of saints Commemoration of South-Slavonic saints Liturgical books

Textological Notes on De Christo et Antichristo by Hippolytus of Rome in the Greek and Slavonic Manuscript Tradition


Daily Life and Religion: The Vienna Euchologia Project

  • Summary/Abstract

    The study of the Euchologia (singular: Euchologion), the prayer books to be used by the clergy, has long been neglected by medieval historians. This is beginning to change, as more and more scholars discover the potential of the Euchologia as a source for social history. Indeed, Euchologia contain besides Eucharistic and sacramental liturgies also prayers for various occasions of the daily life of women, men and children from various strata of society and every geographical region of the Byzantine world. Thus, the Euchologia offer a different perspective than most Byzantine written sources, which concentrate on urban, male, often ordained elites. In 2015 a new research project dedicated to the study of Euchologia was initiated at the Division of Byzantine Research of the Austrian Academy of Sciences. The project seeks to study Greek-language Euchologia copied up to the year 1650 and the prayers they contain.


Comparison of Four Medieval Bulgarian Inscriptions by Letter Frequency

  • Summary/Abstract

    This article compares the letter frequencies of four old Bulgarian inscriptions: the Samuel inscription of 993, the Samuel inscription of Voden, the Bitola inscription of Ivan Vladislav and the Tărnovo inscription of Ivan Asen II. We establish the proximity of the letter frequencies of these inscriptions and thus obtain an argument in favor of the view that the Voden inscription is a product of the same scribal tradition and orthography, and of the same epoch, to which both Samuel’s of 993 and Bitola’s inscriptions belong. The impressive ‘frequency proximity’ of the Voden and Bitola inscriptions is used to offer the hypothesis that the Samuel Dynasty’s royal administration has had a consistent tradition of writing and orthography.


Аріевъ ледъ

Ares Ice

  • Summary/Abstract

    The paper discusses the translation of Ἄρειος πάγος into Slavonic, which until the late seventeenth century is almost invariably Аріевъ ледъ. It is suggested that although this does not correctly render the original meaning of the Greek, translators (and others, including their Greek contemporaries) did not necessarily perceive place names as literally meaningful. Аріевъ ледъ was thus simply the established Slavonic name for the Areopagus, and known as such to Slavonic writers. The principle place where it occurs is in the Acts of the Apostles, where it is used consistently, and there is a varied body of commentary in Slavonic attached to this passage, which is discussed in detail. The use of the toponym in a number of non-Biblical texts is also traced.


Writing Old Cyrillic and Glagolitic in GNU/Linux with the Bulgarian Phonetic Traditional Keyboard Layout


Subscribe to Applied Linguistics