MACHINE LEARNING

Ефективност на генерични модели HTR за историческа кирилица и глаголица: Сравнение на средства Scripta & e-Scripta vol. 23, 2023 floyd Sun, 12/03/2023 - 16:14
Performance of Generic HTR Models on Historical Cyrillic and Glagolitic: Comparison of Engines

The present study offers a comparative evaluation of the performance of different AI-based digital tools for handwritten text recognition (HTR) on historical manuscripts and prints. The focus is on generic models capable of transcribing a range of texts in a similar script. The training dataset for these comprises Old Cyrillic ustav and poluustav manuscripts, on the one hand, and early Glagolitic printed books, on the other. We give an overview of the performance statistics for the HTR platforms Transkribus and eScriptorium as well as for the command-line tool Calamari. In each case, we additionally offer a close, qualitative analysis of select examples in order to convey a sense of the models’ real-world performance. In this way, our study supplies comparative data on the respective capabilities of these technologies that ought to be of interest to scholars working with them in digital humanities projects.

Subject: Language studies Language and Literature Studies Theoretical Linguistics Applied Linguistics Historical Linguistics Computational linguistics South Slavic Languages Philology Translation Studies Keywords: handwritten text recognition TRANSKRIBUS MACHINE LEARNING Cyrillic palaeography Glagolitic printings
Serbian Early Printed Books from Venice: Creating Models for Automatic Text Recognition Using Transkribus Scripta & e-Scripta vol. 22, 2022 floyd Wed, 08/17/2022 - 08:39
Владимир Р. Поломац. Сръбски старопечатни книги от Венеция: cъздаване на модели за автоматично текстово разпознаване чрез Transkribus

The paper describes the process of creating a model for the automatic rec- ognition of Serbian Church Slavonic printed books from Venice (from Božidar and Vincenzo Vuković’s printery) by using the Transkribus software platform, based on the principles of artificial intelligence and machine learning. By using the example of Prayer Book (Euchologion) (1538–1540) from Božidar Vuković’s printery, it has been shown that a successful model for the automatic recognition of individual books (with around 5% of unrecognized characters) can also be trained on the material consisting of approximately 4000 words, and that the increased amount of training material (in our case around 38000 words) leads to the improvement of the model and reduced error rate (between 1–2% of unrecognized characters). The most notable result of the paper is manifested through the creation of a generic model for the automatic text recognition of Serbian Church Slavonic books from Božidar and Vincenzo Vuković’s printery. The ini- tial version of the generic model (called Dionisio 1.0. by the Božidar Vuković’s Italian pseudonym – Dionisio della Vecchia) is the first resource for the automatic recognition of the Serbian medieval Cyrillic script, publicly available to all users of the Transkribus software platform (see https://readcoop.eu/model/dionisio-1-0/).

Subject: e-Scripta Digital humanities Keywords: TRANSKRIBUS Automatic Text Recognition Serbian Early Printed Books Ar- tificial Intelligence MACHINE LEARNING Venice

Recognizing Handwritten Text in Slavic Manuscripts: a Neural-Network Approach Using Transkribus


Subscribe to MACHINE LEARNING