MACHINE LEARNING

Ефективност на генерични модели HTR за историческа кирилица и глаголица: Сравнение на средства Scripta & e-Scripta vol. 23, 2023 floyd Sun, 12/03/2023 - 16:14
Performance of Generic HTR Models on Historical Cyrillic and Glagolitic: Comparison of Engines

The present study offers a comparative evaluation of the performance of different AI-based digital tools for handwritten text recognition (HTR) on historical manuscripts and prints. The focus is on generic models capable of transcribing a range of texts in a similar script. The training dataset for these comprises Old Cyrillic ustav and poluustav manuscripts, on the one hand, and early Glagolitic printed books, on the other. We give an overview of the performance statistics for the HTR platforms Transkribus and eScriptorium as well as for the command-line tool Calamari. In each case, we additionally offer a close, qualitative analysis of select examples in order to convey a sense of the models’ real-world performance. In this way, our study supplies comparative data on the respective capabilities of these technologies that ought to be of interest to scholars working with them in digital humanities projects.

Subject: Language studies Language and Literature Studies Theoretical Linguistics Applied Linguistics Historical Linguistics Computational linguistics South Slavic Languages Philology Translation Studies Keywords: handwritten text recognition TRANSKRIBUS MACHINE LEARNING Cyrillic palaeography Glagolitic printings

Serbian Early Printed Books from Venice: Creating Models for Automatic Text Recognition Using Transkribus

Владимир Р. Поломац. Сръбски старопечатни книги от Венеция: cъздаване на модели за автоматично текстово разпознаване чрез Transkribus

  • Summary/Abstract

    The paper describes the process of creating a model for the automatic rec- ognition of Serbian Church Slavonic printed books from Venice (from Božidar and Vincenzo Vuković’s printery) by using the Transkribus software platform, based on the principles of artificial intelligence and machine learning. By using the example of Prayer Book (Euchologion) (1538–1540) from Božidar Vuković’s printery, it has been shown that a successful model for the automatic recognition of individual books (with around 5% of unrecognized characters) can also be trained on the material consisting of approximately 4000 words, and that the increased amount of training material (in our case around 38000 words) leads to the improvement of the model and reduced error rate (between 1–2% of unrecognized characters). The most notable result of the paper is manifested through the creation of a generic model for the automatic text recognition of Serbian Church Slavonic books from Božidar and Vincenzo Vuković’s printery. The ini- tial version of the generic model (called Dionisio 1.0. by the Božidar Vuković’s Italian pseudonym – Dionisio della Vecchia) is the first resource for the automatic recognition of the Serbian medieval Cyrillic script, publicly available to all users of the Transkribus software platform (see https://readcoop.eu/model/dionisio-1-0/).


Recognizing Handwritten Text in Slavic Manuscripts: a Neural-Network Approach Using Transkribus Scripta & e-Scripta vol. 19, 2019 floyd Tue, 10/08/2019 - 14:32 The paper discusses the automatic text recognition capabilities of neural network models specifically trained to recognize different styles of Church Slavonic handwriting within the software platform Transkribus. Computed character error rates of the models are in the range of 3 to 5 percent; real-life performance shows that specifically trained models, by and large, recognize simple (non-superscript) characters correctly most of the time. The error rate is higher with superscript letters, abbreviations, and word separation. Combined models consisting of training data from different sources are capable of transcribing different styles of Slavic handwriting with low error rates. Automatic text recognition using Transkribus and the models presented in this paper can help improve the efficiency of the process of digitizing Church Slavonic manuscripts and thus boost the number of digitized sources available in the future. Subject: Language and Literature Studies Language studies Studies of Literature Philology Theory of Literature Foreign languages learning Applied Linguistics Computational linguistics Translation Studies Keywords: CHURCH SLAVONIC TRANSKRIBUS AUTOMATIC TRANSCRIPTION MACHINE LEARNING NEURAL NETWORKS ARTIFICIAL INTELLIGENCE
Subscribe to MACHINE LEARNING