handwritten text recognition

Ефективност на генерични модели HTR за историческа кирилица и глаголица: Сравнение на средства Scripta & e-Scripta vol. 23, 2023 floyd Sun, 12/03/2023 - 16:14
Performance of Generic HTR Models on Historical Cyrillic and Glagolitic: Comparison of Engines

The present study offers a comparative evaluation of the performance of different AI-based digital tools for handwritten text recognition (HTR) on historical manuscripts and prints. The focus is on generic models capable of transcribing a range of texts in a similar script. The training dataset for these comprises Old Cyrillic ustav and poluustav manuscripts, on the one hand, and early Glagolitic printed books, on the other. We give an overview of the performance statistics for the HTR platforms Transkribus and eScriptorium as well as for the command-line tool Calamari. In each case, we additionally offer a close, qualitative analysis of select examples in order to convey a sense of the models’ real-world performance. In this way, our study supplies comparative data on the respective capabilities of these technologies that ought to be of interest to scholars working with them in digital humanities projects.

Subject: Language studies Language and Literature Studies Theoretical Linguistics Applied Linguistics Historical Linguistics Computational linguistics South Slavic Languages Philology Translation Studies Keywords: handwritten text recognition TRANSKRIBUS MACHINE LEARNING Cyrillic palaeography Glagolitic printings

From Annotation to Modeling: Computational Horizons for Medieval Slavic Studies.

От анотиране към моделиране: компютърни хоризонти за славистичната медиевистика

  • Summary/Abstract

    This paper is a write-up of a keynote from El’Manuscript 2021, reflecting on the ways in which the field of computationally-supported medieval Slavic studies has and has not changed since the mid-2000’s. Looking towards developments in the broader fields of digital humanities and natural-language processing, it explores the ways that recent improvements in the tools at our disposal for mass digitization of manuscripts and text analysis at scale open up possibilities for working with manuscripts that have received very little attention. For these advancements to be feasible, however, scholars will need to prepare and share their digitized texts and annotations in ways that are not currently the norm, though a number of projects provide exemplary models of how these new conventions could be put into practice.


Subscribe to handwritten text recognition