Applied Linguistics | Scripta & e-Scripta

Ефективност на генерични модели HTR за историческа кирилица и глаголица: Сравнение на средства Scripta & e-Scripta vol. 23, 2023 floyd Sun, 12/03/2023 - 16:14 Achim Rabus Walker R. Thompson

Performance of Generic HTR Models on Historical Cyrillic and Glagolitic: Comparison of Engines

The present study offers a comparative evaluation of the performance of different AI-based digital tools for handwritten text recognition (HTR) on historical manuscripts and prints. The focus is on generic models capable of transcribing a range of texts in a similar script. The training dataset for these comprises Old Cyrillic ustav and poluustav manuscripts, on the one hand, and early Glagolitic printed books, on the other. We give an overview of the performance statistics for the HTR platforms Transkribus and eScriptorium as well as for the command-line tool Calamari. In each case, we additionally offer a close, qualitative analysis of select examples in order to convey a sense of the models’ real-world performance. In this way, our study supplies comparative data on the respective capabilities of these technologies that ought to be of interest to scholars working with them in digital humanities projects.

Subject: Language studies Language and Literature Studies Theoretical Linguistics Applied Linguistics Historical Linguistics Computational linguistics South Slavic Languages Philology Translation Studies Keywords: handwritten text recognition TRANSKRIBUS MACHINE LEARNING Cyrillic palaeography Glagolitic printings

Scripta & e-Scripta vol. 23, 2023

Victor Baranov Roman M. Gnutikov Maria Novak Способы демонстрции данных славянского исторического полнотекстового корпуса “Манускрипт”

Data Demonstration Techniques in Slavonic Historical Text Corpus “Manuscript”

Summary/Abstract

The article discusses theoretical and practical issues of creating tools for demonstrating medieval Slavonic text corpus at the “Manuscript” website (http:// manuscripts.ru/). The specific features of the historical corpus and its sources are: the limited number of manuscripts, variability of medieval graphics and orthography, complex structure, and composition of original documents. They require special instruments and techniques for data preparation (information about a text and its physical media, analytical tagging of fragments, variability, and other), and visualization of data sampling, including texts. The article focuses on the ways of solving two opposite tasks: the texts’ demonstration in a form as close as possible to the original and their simplified form, and, consequently, the possibilities of their transformation. The first task should be solved by preparing a transcription via a specialized editing tool, which interacts with the full-text database with a complete set of required characters, text formatting, and make-up to fit the original page. To solve the second problem, analytical tagging (chapters and verses, authors of texts, structure of manuscript, main text and marginalia, and so forth) and linguistic tagging (including lemmatization) are performed to make data search and data transformation available when displayed. The latter allows users to see a text in modern Cyrillic or Latin, syllables, meaning of analytical fragments, links between the main text and its marginalia, and so forth. The ability to data search based on deep tagging and the digital edition (LIM, MS 37, 13th c., 291 f.) which has been included in the “Manuscript” historical corpus (http://manuscripts.ru/mns/main?P_TEXT=94065041&p_lang=EN).

Subject: Language studies Language and Literature Studies Theoretical Linguistics Applied Linguistics Historical Linguistics Computational linguistics Philology Translation Studies

Keywords: Medieval Slavonic manuscripts digital edition transcription analytical and linguistic tagging Apostolus Christinopolitanus

Текущ българо-австрийски проект за проучване на ръкопис Зограф 105 и неговия фрагмент № 3070 Scripta & e-Scripta vol. 23, 2023 floyd Sun, 12/03/2023 - 13:56 Anissava Miltenova

The current Bulgarian-Austrian project on the investigations of the manuscript Zograf No. 105 and its fragment No. 3070

Subject: Language studies Language and Literature Studies Theoretical Linguistics Applied Linguistics Historical Linguistics Philology Translation Studies Keywords: Translation Studies

Scripta & e-Scripta vol. 19, 2019

Achim Rabus Recognizing Handwritten Text in Slavic Manuscripts: a Neural-Network Approach Using Transkribus

Summary/Abstract

The paper discusses the automatic text recognition capabilities of neural network models specifically trained to recognize different styles of Church Slavonic handwriting within the software platform Transkribus. Computed character error rates of the models are in the range of 3 to 5 percent; real-life performance shows that specifically trained models, by and large, recognize simple (non-superscript) characters correctly most of the time. The error rate is higher with superscript letters, abbreviations, and word separation. Combined models consisting of training data from different sources are capable of transcribing different styles of Slavic handwriting with low error rates. Automatic text recognition using Transkribus and the models presented in this paper can help improve the efficiency of the process of digitizing Church Slavonic manuscripts and thus boost the number of digitized sources available in the future.

Subject: Language and Literature Studies Language studies Studies of Literature Philology Theory of Literature Foreign languages learning Applied Linguistics Computational linguistics Translation Studies

Keywords: CHURCH SLAVONIC TRANSKRIBUS AUTOMATIC TRANSCRIPTION MACHINE LEARNING NEURAL NETWORKS ARTIFICIAL INTELLIGENCE

Scripta & e-Scripta vol. 19, 2019

Victor Baranov Создание и использование исторических корпусов славянских письменных памятников

Creation and Using of Historical Corpora of Slavonic Manuscripts

Summary/Abstract

The requirements for historical corpora of medieval texts 1) are determined by properties of the data and the historical-linguistic, textological and linguo-textological tasks to be solved; 2) and should be realized with the help of special tagging, processing procedures, query parameters and retrieval demonstrations. The corpus should a) have metadata concerning both texts and manuscripts, and involving both linguistic and analytical tagging; b) support the rendering of documents (facsimile and transcription), concordances, lists, and comparison of subcorpora data; c) simplify graphic-orthographic variation during data search and visualization; d) provide tools both for processing and searching linguistic material and its further analysis according to traditional methods; and e) support problem description and resolution by applying corpus methods that engage with the quantity, distribution, co-occurrence, and variation of linguistic units in big data arrays. The realization of these requirements is demonstrated on a subcorpus of three copies of chronicles (Laurentian, Hypatian, Radzivilovsky) from the historical corpus project “Manuscript” (manuscripts.ru).

Subject: Language and Literature Studies Language studies Studies of Literature Philology Theory of Literature Theoretical Linguistics Applied Linguistics

Keywords: HISTORICAL SLAVONIC CORPUS RUSSIAN CHRONICLES LINGUISTIC STATISTICS

Scripta & e-Scripta vol. 18, 2018

Tsvetana Dimitrova Andrej Boyadzhiev Electronic Edition and Linguistic Annotation of Slavic Fragments

Summary/Abstract

The paper introduces a project on edition and linguistic annotation of Medieval and Early Modern South Slavic manuscript fragments. The main topic is implementation of various approaches on integration of electronic edtion, manuscript description and linguistic annotation. A corpus will include fragments from parchment manuscripts kept in Bulgarian repositories. We will illustrate the approach with several pieces of texts from various fragments. The representation will be supplied with textual, as well as part-of-speech and basic syntactic annotation. On the basis of it an attempt will be made at experimental anaphora and related morpho-syntactic annotation. The work will offer a discussion on the features that will be useful for such annotation. The project relies on eXist database (http://exist-db.org) and the initiatives: Repertorium (http://repertorium.obdurodon.org/), PROIEL (http://www.hf.uio.no/ifikk/english/ research/ projects/proiel/) and TOROT (http://site.uit.no/slavhistcorp/files/2015/04/Eckhoff.pdf).

Subject: Language studies Language and Literature Studies Theoretical Linguistics Applied Linguistics Studies of Literature Computational linguistics South Slavic Languages Philology South Slavic manuscripts Fragments Linguistic annotation Linguistic corpora Electronic text edition Electronic description XML technologies

Scripta & e-Scripta vol. 18, 2018

Anissava Miltenova Терминология в палеославистике и создание сети между существующими цифровыми корпусами

Terminology in Palaeoslavistics and Set up Networking between Existing Digital Corpora

Summary/Abstract

The paper discusses problems and points of view related to set up networking between Scripta Bulgarica project (http://www.scripta-bulgarica.eu/bg), Repertorium of Old Bulgarian literature and letters (http://repertorium.obdurodon.org/), and also other corpuses (e.g. Codex Suprasliensis from the 10th century: http://suprasliensis.obdurodon.org/, etc.) for further improvement of linking between data bases. The proposed networking will connect transcribed texts with terminology in palaeoslavistics, and other on-line resources, such as electronic editions of individual sites, electronic dictionaries, encyclopedias, bibliographic arrays and so on. The networking will decided a number of problems that can not yet solve in a satisfactory way. The results will be useful not only for the palaeoslavists but also for librarians, teachers, and students, representatives of mass media and the general public interested in Slavic literacy.

Subject: Ontology of terms Palaeoslavistic Computer technologies Standardization of formats XML approach Language studies Language and Literature Studies Theoretical Linguistics Applied Linguistics Studies of Literature Computational linguistics Bulgarian Literature South Slavic Languages Philology

Scripta & e-Scripta vol. 18, 2018

Ralf Cleminson Text or Paratext? The Synopsis Apostolorum of Dorotheus of Tyre

Summary/Abstract

The article discusses the Synopsis Apostolorum attributed to Dorotheus of Tyre, which purports to be a list of the Seventy Apostles. It gives a brief overview of the history of the text in Greek and Slavonic. In contrast to the Greek tradition, where it may be found in miscellanies of various types and also in manuscripts of the Apostolos which are provided with relatively extensive apparatus, in Slavonic it is found exclusively in Apostolos manuscripts. The redaction of the Synopsis, moreover, corresponds to the redaction of the Apostolos; there are discernible differences between the texts in each of the three Slavonic redactions in which it is represented. This indicates that it was translated as part of the accompanying text each time that the Apostolos itself was translated. This means that the Slavonic version (unlike the Greek) exists exclusively as paratext, but that this paratextual status, being dependent on the version, is not intrinsic to the work but a function of its history. This in turn points to the necessity of taking the paratext into account in any study of the text of the Bible.

Subject: Apostolos New Testament Slavonic translations Parabiblical literature Language and Literature Studies Theoretical Linguistics Applied Linguistics Studies of Literature Philology Translation Studies

Scripta & e-Scripta vol. 18, 2018

Diana Atanassova-Pencheva Dimka Gicheva-Gocheva Translating the Subtleties. The Philosophical Categories in the Symeon Collection (Symeon’s Miscellany

Summary/Abstract

When we study translations from classical or Byzantine Greek into Old Bulgarian, we usually encounter two aspects of the question how: firstly, the how of the linguistic rendering, the how of the translation techniques used for one term or another. The second aspect is that of evaluating the how: shall we praise or, on the contrary, express regrets in respect of the translator’s work. Besides these two inherent aspects of the question how, a third one has arisen in the last three decades in Bulgaria. We have a long tradition of translating Old Greek and Byzantine texts into Old Bulgarian, but with respect to the philosophical and theological terminology used nowadays, are we obliged to follow the patterns of the past, the forms of the language, suggested by this millennium long tradition? With respect to the formation of the Bulgarian philosophical and theological language, the Symeon’s Miscellany is an extremely important source because from f. 222 to f. 237 a range of issues is discussed as answers to questions 29 and 30. This section of the writing includes clarification of terms, categories and concepts from the classical Greek and/or the Christian philosophy and demands profound interdisciplinary research.

Subject: Byzantine Greek Tsar Simeon’s florilegia Terminology Philosophical and theological language Language and Literature Studies Theoretical Linguistics Applied Linguistics Studies of Literature Philology Translation Studies

Scripta & e-Scripta vol. 18, 2018

Vera Podkovyrova Указатели названий и инципитов разделов и глав славянских списков Откровения Иоанна Богослова как поисковый аппарат и основа для изучения редакций текста

Indexes of Names and Incipita of Sections and Chapters of the Slavonic Witnesses of the Revelation of John the Theologian as a Search Tool and a Basis for Studying Versions of the Text

Summary/Abstract

The article is devoted to the research of the Revelation of John the Theologian text editions. The main source of this aspect of text study is the peculiarity of the text structure: names, intsipits, explicites. The features of the three known scientific editions of the Apocalypse are determined and presented in the table. The methods of work on revealing these signs is described.

Subject: Apocalypsis St. John Theologion Text tradition Titles Incipita Filliation of the texts Language and Literature Studies Theoretical Linguistics Applied Linguistics Studies of Literature Philology Translation Studies

Иерархическая модель гимнографической терминологии: дигитальное приложение Scripta & e-Scripta vol. 18, 2018 floyd Fri, 12/28/2018 - 07:33 Regina Koycheva

A Hierarchical Model of the Hymnographic Terminology: Digital Application

The article provides metadata on the descriptions of hymnographic terms exposed on the Scripta Bulgarica electronic portal (http://www.scripta-bulgarica.eu). The purpose of the analysis is to take the first step towards building a detailed digital ontology of all hymnographic terminology for the needs of specialized web products. All the hymnographic concepts included in the portal are arranged in a hierarchical system which comprises as well some of the most closely related concepts of other kinds (generally liturgical and literary). The selection of terms involves several of the basic musical and musicological hymnographic concepts too, but focuses on the verbal side of the chants. The proposed hierarchical model is based on two types of relationships: category/subcategory and whole/part, the second of which ranks the hymnographic terminology in five levels from the highest (Books) to the lowest (Genre components).

Subject: Hymnography Terminology Classification Hierarchy Category-subcategory Whole-part Scripta Bulgarica Language and Literature Studies Theoretical Linguistics Applied Linguistics Studies of Literature Philology Translation Studies

Scripta & e-Scripta vol. 18, 2018

Marina Krutova Название и самоназвание в номинативных комплексах рукописных книг XIV–XIX веков (на материале Отдела рукописей Российской государственной библиотеки)

Title and Self-title in Nominative Complexes of Manuscripts 14th–19th Centuries (Based on the Material of the Manuscript Department of the Russian State Library)

Summary/Abstract

The article defines the concepts related to the naming of written monuments in the Slavonic-Russian manuscript tradition. The definitions of basic concepts: name, self, complex nominative, nominative unit. Identified the causes, the appearance of names in different parts of the same manuscript books associated with the execution name and the self of different functions: nominative, informative, hermeneutics, didactic, testoobraznaja, aesthetic. A comparison of the concepts of title and self-title on the example of handwritten books stored in the manuscript Department of the Russian state library, the Fund of the Trinity-Sergius Lavra. It is revealed that the self-name is more extensive and informative than the name, but the name, in turn, differs in structural and semantic variability. Variability of names in the composition of nominative complexes is characterized.

Subject: Language studies Language and Literature Studies Theoretical Linguistics Applied Linguistics Studies of Literature Philology Palaeoslavistic Titles of manuscripts Definitions History of texts History of books Putting into practice

Проложные жития в средневековой южнославянской книжности Scripta & e-Scripta vol. 18, 2018 floyd Thu, 12/27/2018 - 07:21 Radoslava Stankova

The Prologue Vitae in the Medieval South-Slavonic Literature

The article deals with the review of prologue vitae (sinaxariа), which are short hagiographic (narrative) texts for saints and festivals, which are contained in two calendar liturgical books – Prologue and Menaia. They are founded in the composition of two types of hagiographic sinaxaria, in the Slavic literature called Simple and Verse Prologue. The first is distinguished by the verses (the two- or the three-verses), written before the sinaxaria vitae. A comparison is made between the prologue vitae from the Simple and the Verse Prologue of three popular Christian saints – George, Demetrius, Nicholas. To them are added the vitae from the Simple and the Verse Prologue for St Petka (Parasceve of Epibatae), St Ivan Rilski (John of Rila) and St Simeon and Sava (of Serbia) – emblematic South Slavic saints.

Subject: Language studies Language and Literature Studies Theoretical Linguistics Applied Linguistics Studies of Literature Philology Translation Studies Prologue Vitae Simple Prologue Prologue in Verse Menaion Commemoration of saints Commemoration of South-Slavonic saints Liturgical books

Scripta & e-Scripta vol. 18, 2018

Ivan I. Iliev Textological Notes on De Christo et Antichristo by Hippolytus of Rome in the Greek and Slavonic Manuscript Tradition

Summary/Abstract

The article has two main focuses – first, it follows the most significant and important Antichrist myth researches, and secondly, the Greek tradition of De Christo et Antichristo by Hippolytus of Rome and the Slavonic versions of the text. The Slavonic witnesses are examined according to their omissions, additions, grammatical and morphological variations, and also some of the changes in the Bible quotations are highlighted. This work does not pretend to present new information on the Greek sources but to demonstrate how important the Slavonic translation is to the interpretation of the Greek original. The most interesting results are pointed out in the relation with the Greek text itself, where the proximity between the Greek fragment of Meteora Monastery 573 and the Slavonic tradition is presented.

Subject: Language and Literature Studies Language studies Theoretical Linguistics Applied Linguistics Studies of Literature Philology Translation Studies Antichrist Hippolytus of Rome Textology Greek tradition Manuscripts

Scripta & e-Scripta vol. 18, 2018

Eirini Afentoulidou Daily Life and Religion: The Vienna Euchologia Project

Summary/Abstract

The study of the Euchologia (singular: Euchologion), the prayer books to be used by the clergy, has long been neglected by medieval historians. This is beginning to change, as more and more scholars discover the potential of the Euchologia as a source for social history. Indeed, Euchologia contain besides Eucharistic and sacramental liturgies also prayers for various occasions of the daily life of women, men and children from various strata of society and every geographical region of the Byzantine world. Thus, the Euchologia offer a different perspective than most Byzantine written sources, which concentrate on urban, male, often ordained elites. In 2015 a new research project dedicated to the study of Euchologia was initiated at the Division of Byzantine Research of the Austrian Academy of Sciences. The project seeks to study Greek-language Euchologia copied up to the year 1650 and the prayers they contain.

Subject: Language and Literature Studies Applied Linguistics Studies of Literature Philology Translation Studies Byzantine Euchologion History of the society Daily life Linguistics Tradition of the text

Scripta & e-Scripta vol. 16-17, 2017

Svilena Hristova Jordan Tabov Comparison of Four Medieval Bulgarian Inscriptions by Letter Frequency

Summary/Abstract

This article compares the letter frequencies of four old Bulgarian inscriptions: the Samuel inscription of 993, the Samuel inscription of Voden, the Bitola inscription of Ivan Vladislav and the Tărnovo inscription of Ivan Asen II. We establish the proximity of the letter frequencies of these inscriptions and thus obtain an argument in favor of the view that the Voden inscription is a product of the same scribal tradition and orthography, and of the same epoch, to which both Samuel’s of 993 and Bitola’s inscriptions belong. The impressive ‘frequency proximity’ of the Voden and Bitola inscriptions is used to offer the hypothesis that the Samuel Dynasty’s royal administration has had a consistent tradition of writing and orthography.

Subject: History Language studies Literature Studies Cultural history Applied Linguistics Middle Ages South Slavic Languages Philology

Scripta & e-Scripta vol. 16-17, 2017

Тatjana A. Timčenko El’Manuscript–2016. Results

Summary/Abstract

Subject: Language studies Literature Studies Theoretical Linguistics Applied Linguistics Philology

Scripta & e-Scripta vol. 16-17, 2017

Ralf Cleminson Аріевъ ледъ

Ares Ice

Summary/Abstract

The paper discusses the translation of Ἄρειος πάγος into Slavonic, which until the late seventeenth century is almost invariably Аріевъ ледъ. It is suggested that although this does not correctly render the original meaning of the Greek, translators (and others, including their Greek contemporaries) did not necessarily perceive place names as literally meaningful. Аріевъ ледъ was thus simply the established Slavonic name for the Areopagus, and known as such to Slavonic writers. The principle place where it occurs is in the Acts of the Apostles, where it is used consistently, and there is a varied body of commentary in Slavonic attached to this passage, which is discussed in detail. The use of the toponym in a number of non-Biblical texts is also traced.

Subject: History Language studies Literature Studies Cultural history Applied Linguistics Studies of Literature Middle Ages South Slavic Languages Philology Translation Studies

Scripta & e-Scripta vol. 14-15, 2015

Andrej Boyadzhiev Writing Old Cyrillic and Glagolitic in GNU/Linux with the Bulgarian Phonetic Traditional Keyboard Layout

Summary/Abstract

The paper proposes several approaches for extending the possibility to write Medieval Slavonic Cyrillic and Glagolitic letters in GNU/Linux environment. This is achived by extension of existing keyboard layout, inclusion of newly defined Glagolitic one and by adding more combinations of keys through the multi key (compose key) technique. The proposal is tested and works in openSUSE GNU/LINUX distributions versions 11.3 through 13.2, the rolling release version Tumbleweed with KDE4, Plasma 5 and GNOME desktop environments.

Subject: Language and Literature Studies Library and Information Science Electronic information storage and retrieval Applied Linguistics Philology Computational linguistics Other Cataloguing Archiving

Subscribe to Applied Linguistics