Performance of Generic HTR Models on Historical Cyrillic and Glagolitic: Comparison of Engines
-
Summary/Abstract
The present study offers a comparative evaluation of the performance of different AI-based digital tools for handwritten text recognition (HTR) on historical manuscripts and prints. The focus is on generic models capable of transcribing a range of texts in a similar script. The training dataset for these comprises Old Cyrillic ustav and poluustav manuscripts, on the one hand, and early Glagolitic printed books, on the other. We give an overview of the performance statistics for the HTR platforms Transkribus and eScriptorium as well as for the command-line tool Calamari. In each case, we additionally offer a close, qualitative analysis of select examples in order to convey a sense of the models’ real-world performance. In this way, our study supplies comparative data on the respective capabilities of these technologies that ought to be of interest to scholars working with them in digital humanities projects.
Data Demonstration Techniques in Slavonic Historical Text Corpus “Manuscript”
-
Summary/Abstract
The article discusses theoretical and practical issues of creating tools for demonstrating medieval Slavonic text corpus at the “Manuscript” website (http:// manuscripts.ru/). The specific features of the historical corpus and its sources are: the limited number of manuscripts, variability of medieval graphics and orthography, complex structure, and composition of original documents. They require special instruments and techniques for data preparation (information about a text and its physical media, analytical tagging of fragments, variability, and other), and visualization of data sampling, including texts. The article focuses on the ways of solving two opposite tasks: the texts’ demonstration in a form as close as possible to the original and their simplified form, and, consequently, the possibilities of their transformation. The first task should be solved by preparing a transcription via a specialized editing tool, which interacts with the full-text database with a complete set of required characters, text formatting, and make-up to fit the original page. To solve the second problem, analytical tagging (chapters and verses, authors of texts, structure of manuscript, main text and marginalia, and so forth) and linguistic tagging (including lemmatization) are performed to make data search and data transformation available when displayed. The latter allows users to see a text in modern Cyrillic or Latin, syllables, meaning of analytical fragments, links between the main text and its marginalia, and so forth. The ability to data search based on deep tagging and the digital edition (LIM, MS 37, 13th c., 291 f.) which has been included in the “Manuscript” historical corpus (http://manuscripts.ru/mns/main?P_TEXT=94065041&p_lang=EN).
-
Summary/Abstract
The paper discusses the automatic text recognition capabilities of neural network models specifically trained to recognize different styles of Church Slavonic handwriting within the software platform Transkribus. Computed character error rates of the models are in the range of 3 to 5 percent; real-life performance shows that specifically trained models, by and large, recognize simple (non-superscript) characters correctly most of the time. The error rate is higher with superscript letters, abbreviations, and word separation. Combined models consisting of training data from different sources are capable of transcribing different styles of Slavic handwriting with low error rates. Automatic text recognition using Transkribus and the models presented in this paper can help improve the efficiency of the process of digitizing Church Slavonic manuscripts and thus boost the number of digitized sources available in the future.
Создание и использование исторических корпусов славянских письменных памятников
Scripta & e-Scripta vol. 19, 2019
floyd
Tue, 10/08/2019 - 14:29
Victor Baranov
Creation and Using of Historical Corpora of Slavonic Manuscripts
The requirements for historical corpora of medieval texts 1) are determined by properties of the data and the historical-linguistic, textological and linguo-textological tasks to be solved; 2) and should be realized with the help of special tagging, processing procedures, query parameters and retrieval demonstrations. The corpus should a) have metadata concerning both texts and manuscripts, and involving both linguistic and analytical tagging; b) support the rendering of documents (facsimile and transcription), concordances, lists, and comparison of subcorpora data; c) simplify graphic-orthographic variation during data search and visualization; d) provide tools both for processing and searching linguistic material and its further analysis according to traditional methods; and e) support problem description and resolution by applying corpus methods that engage with the quantity, distribution, co-occurrence, and variation of linguistic units in big data arrays. The realization of these requirements is demonstrated on a subcorpus of three copies of chronicles (Laurentian, Hypatian, Radzivilovsky) from the historical corpus project “Manuscript” (manuscripts.ru).
Subject:
Language and Literature Studies
Language studies
Studies of Literature
Philology
Theory of Literature
Theoretical Linguistics
Applied Linguistics
Keywords:
HISTORICAL SLAVONIC CORPUS
RUSSIAN CHRONICLES
LINGUISTIC STATISTICS
Терминология в палеославистике и создание сети между существующими цифровыми корпусами
Scripta & e-Scripta vol. 18, 2018
floyd
Fri, 12/28/2018 - 07:56
Anissava Miltenova
Terminology in Palaeoslavistics and Set up Networking between Existing Digital Corpora
The paper discusses problems and points of view related to set up networking between Scripta Bulgarica project (http://www.scripta-bulgarica.eu/bg), Repertorium of Old Bulgarian literature and letters (http://repertorium.obdurodon.org/), and also other corpuses (e.g. Codex Suprasliensis from the 10th century: http://suprasliensis.obdurodon.org/, etc.) for further improvement of linking between data bases. The proposed networking will connect transcribed texts with terminology in palaeoslavistics, and other on-line resources, such as electronic editions of individual sites, electronic dictionaries, encyclopedias, bibliographic arrays and so on. The networking will decided a number of problems that can not yet solve in a satisfactory way. The results will be useful not only for the palaeoslavists but also for librarians, teachers, and students, representatives of mass media and the general public interested in Slavic literacy.
Subject:
Ontology of terms
Palaeoslavistic
Computer technologies
Standardization of formats
XML approach
Language studies
Language and Literature Studies
Theoretical Linguistics
Applied Linguistics
Studies of Literature
Computational linguistics
Bulgarian Literature
South Slavic Languages
Philology
-
Summary/Abstract
The article discusses the Synopsis Apostolorum attributed to Dorotheus of Tyre, which purports to be a list of the Seventy Apostles. It gives a brief overview of the history of the text in Greek and Slavonic. In contrast to the Greek tradition, where it may be found in miscellanies of various types and also in manuscripts of the Apostolos which are provided with relatively extensive apparatus, in Slavonic it is found exclusively in Apostolos manuscripts. The redaction of the Synopsis, moreover, corresponds to the redaction of the Apostolos; there are discernible differences between the texts in each of the three Slavonic redactions in which it is represented. This indicates that it was translated as part of the accompanying text each time that the Apostolos itself was translated. This means that the Slavonic version (unlike the Greek) exists exclusively as paratext, but that this paratextual status, being dependent on the version, is not intrinsic to the work but a function of its history. This in turn points to the necessity of taking the paratext into account in any study of the text of the Bible.
Translating the Subtleties. The Philosophical Categories in the Symeon Collection (Symeon’s Miscellany
Scripta & e-Scripta vol. 18, 2018
floyd
Fri, 12/28/2018 - 07:44
Diana Atanassova-Pencheva
Dimka Gicheva-Gocheva
When we study translations from classical or Byzantine Greek into Old Bulgarian, we usually encounter two aspects of the question how: firstly, the how of the linguistic rendering, the how of the translation techniques used for one term or another. The second aspect is that of evaluating the how: shall we praise or, on the contrary, express regrets in respect of the translator’s work. Besides these two inherent aspects of the question how, a third one has arisen in the last three decades in Bulgaria. We have a long tradition of translating Old Greek and Byzantine texts into Old Bulgarian, but with respect to the philosophical and theological terminology used nowadays, are we obliged to follow the patterns of the past, the forms of the language, suggested by this millennium long tradition? With respect to the formation of the Bulgarian philosophical and theological language, the Symeon’s Miscellany is an extremely important source because from f. 222 to f. 237 a range of issues is discussed as answers to questions 29 and 30. This section of the writing includes clarification of terms, categories and concepts from the classical Greek and/or the Christian philosophy and demands profound interdisciplinary research.
Subject:
Byzantine Greek
Tsar Simeon’s florilegia
Terminology
Philosophical and theological language
Language and Literature Studies
Theoretical Linguistics
Applied Linguistics
Studies of Literature
Philology
Translation Studies
Indexes of Names and Incipita of Sections and Chapters of the Slavonic Witnesses of the Revelation of John the Theologian as a Search Tool and a Basis for Studying Versions of the Text
-
Summary/Abstract
The article is devoted to the research of the Revelation of John the Theologian text editions. The main source of this aspect of text study is the peculiarity of the text structure: names, intsipits, explicites. The features of the three known scientific editions of the Apocalypse are determined and presented in the table. The methods of work on revealing these signs is described.
A Hierarchical Model of the Hymnographic Terminology: Digital Application
-
Summary/Abstract
The article provides metadata on the descriptions of hymnographic terms exposed on the Scripta Bulgarica electronic portal (http://www.scripta-bulgarica.eu). The purpose of the analysis is to take the first step towards building a detailed digital ontology of all hymnographic terminology for the needs of specialized web products. All the hymnographic concepts included in the portal are arranged in a hierarchical system which comprises as well some of the most closely related concepts of other kinds (generally liturgical and literary). The selection of terms involves several of the basic musical and musicological hymnographic concepts too, but focuses on the verbal side of the chants. The proposed hierarchical model is based on two types of relationships: category/subcategory and whole/part, the second of which ranks the hymnographic terminology in five levels from the highest (Books) to the lowest (Genre components).
Название и самоназвание в номинативных комплексах рукописных книг XIV–XIX веков (на материале Отдела рукописей Российской государственной библиотеки)
Scripta & e-Scripta vol. 18, 2018
floyd
Fri, 12/28/2018 - 07:24
Marina Krutova
Title and Self-title in Nominative Complexes of Manuscripts 14th–19th Centuries (Based on the Material of the Manuscript Department of the Russian State Library)
The article defines the concepts related to the naming of written monuments in the Slavonic-Russian manuscript tradition. The definitions of basic concepts: name, self, complex nominative, nominative unit. Identified the causes, the appearance of names in different parts of the same manuscript books associated with the execution name and the self of different functions: nominative, informative, hermeneutics, didactic, testoobraznaja, aesthetic. A comparison of the concepts of title and self-title on the example of handwritten books stored in the manuscript Department of the Russian state library, the Fund of the Trinity-Sergius Lavra. It is revealed that the self-name is more extensive and informative than the name, but the name, in turn, differs in structural and semantic variability. Variability of names in the composition of nominative complexes is characterized.
Subject:
Language studies
Language and Literature Studies
Theoretical Linguistics
Applied Linguistics
Studies of Literature
Philology
Palaeoslavistic
Titles of manuscripts
Definitions
History of texts
History of books
Putting into practice
Textological Notes on De Christo et Antichristo by Hippolytus of Rome in the Greek and Slavonic Manuscript Tradition
Scripta & e-Scripta vol. 18, 2018
floyd
Thu, 12/27/2018 - 07:14
Ivan I. Iliev
The article has two main focuses – first, it follows the most significant and important Antichrist myth researches, and secondly, the Greek tradition of De Christo et Antichristo by Hippolytus of Rome and the Slavonic versions of the text. The Slavonic witnesses are examined according to their omissions, additions, grammatical and morphological variations, and also some of the changes in the Bible quotations are highlighted. This work does not pretend to present new information on the Greek sources but to demonstrate how important the Slavonic translation is to the interpretation of the Greek original. The most interesting results are pointed out in the relation with the Greek text itself, where the proximity between the Greek fragment of Meteora Monastery 573 and the Slavonic tradition is presented.
Subject:
Language and Literature Studies
Language studies
Theoretical Linguistics
Applied Linguistics
Studies of Literature
Philology
Translation Studies
Antichrist
Hippolytus of Rome
Textology
Greek tradition
Manuscripts
-
Summary/Abstract
The study of the Euchologia (singular: Euchologion), the prayer books to be used by the clergy, has long been neglected by medieval historians. This is beginning to change, as more and more scholars discover the potential of the Euchologia as a source for social history. Indeed, Euchologia contain besides Eucharistic and sacramental liturgies also prayers for various occasions of the daily life of women, men and children from various strata of society and every geographical region of the Byzantine world. Thus, the Euchologia offer a different perspective than most Byzantine written sources, which concentrate on urban, male, often ordained elites. In 2015 a new research project dedicated to the study of Euchologia was initiated at the Division of Byzantine Research of the Austrian Academy of Sciences. The project seeks to study Greek-language Euchologia copied up to the year 1650 and the prayers they contain.
Comparison of Four Medieval Bulgarian Inscriptions by Letter Frequency
Scripta & e-Scripta vol. 16-17, 2017
floyd
Wed, 07/12/2017 - 21:09
Svilena Hristova
Jordan Tabov
This article compares the letter frequencies of four old Bulgarian inscriptions: the Samuel inscription of 993, the Samuel inscription of Voden, the Bitola inscription of Ivan Vladislav and the Tărnovo inscription of Ivan Asen II. We establish the proximity of the letter frequencies of these inscriptions and thus obtain an argument in favor of the view that the Voden inscription is a product of the same scribal tradition and orthography, and of the same epoch, to which both Samuel’s of 993 and Bitola’s inscriptions belong. The impressive ‘frequency proximity’ of the Voden and Bitola inscriptions is used to offer the hypothesis that the Samuel Dynasty’s royal administration has had a consistent tradition of writing and orthography.
Subject:
History
Language studies
Literature Studies
Cultural history
Applied Linguistics
Middle Ages
South Slavic Languages
Philology
Аріевъ ледъ
Scripta & e-Scripta vol. 16-17, 2017
floyd
Wed, 07/12/2017 - 21:05
Ralf Cleminson
Ares Ice
The paper discusses the translation of Ἄρειος πάγος into Slavonic, which until the late seventeenth century is almost invariably Аріевъ ледъ. It is suggested that although this does not correctly render the original meaning of the Greek, translators (and others, including their Greek contemporaries) did not necessarily perceive place names as literally meaningful. Аріевъ ледъ was thus simply the established Slavonic name for the Areopagus, and known as such to Slavonic writers. The principle place where it occurs is in the Acts of the Apostles, where it is used consistently, and there is a varied body of commentary in Slavonic attached to this passage, which is discussed in detail. The use of the toponym in a number of non-Biblical texts is also traced.
Subject:
History
Language studies
Literature Studies
Cultural history
Applied Linguistics
Studies of Literature
Middle Ages
South Slavic Languages
Philology
Translation Studies
Subscribe to Applied Linguistics