Използване на приложения за разпознаване на ръкописни текстове (HTR) при транскрибиране на многоезична историческа лексика

Using Handwritten Text Recognition (HTR) Tools to Transcribe Historical Multilingual Lexica

  • Summary/Abstract
    The paper discusses some results obtained as part of an ongoing project at the Slavic Institute of Heidelberg University to produce automatic transcriptions of an early 18th century trilingual printed dictionary (Fedor Polikarpov’s Leksikon trejazyčnyj) and, on a preliminary basis, of a 17th century trilingual manuscript (Epifanij Slavineckii’s working copy of his Greek–Slavic–Latin dictionary) using the handwritten text recognition (HTR) platforms Transkribus and eScriptorium. It is argued that there are considerable advantages to employing such tools in terms of the simplification and acceleration of work on multilingual edition projects. Moreover, a comparison of our experience working with Transkribus and eScriptorium is given, along with an overview of the practical benefits and challenges of working with each of these platforms.
    Subject(s): digital humanities //

Използване на инструменти за разпознаване на ръкописни текстове (HTR) върху двуезични евенкско-руски ръкописи от колекцията на Константин Ричков

Using Handwritten Text Recognition on bilingual Evenki-Russian manuscripts of Konstantin Rychkov

  • Summary/Abstract

    We report on applying Handwritten Text Recognition (HTR) to manuscripts from the archive of Konstantin Rychkov preserved at IOM RAS, St. Petersburg, within the INEL project. Folklore texts in Evenki (Tungusic) were collected in Western Siberia in 1910s. We used services provided by the Transkribus platform. The necessary step of Layout Analysis proved to be time-consuming due to the organization of the parallel Evenki- Russian text on the page without following a strict separation line. HTR models have been trained successively on different amounts of data up to 521 pages. The best Character Error Rate attained on validation data for the largest dataset is 4.50% for models trained on all characters. The distribution of errors is non-uniform: most errors are due to just a few problematic issues, especially diacritics such as the accent marking stress. It is written high above the line and frequently cut off from the line images at the preprocessing stage. After excluding the stress mark from training data and recognition, the lowest CER dropped to 2.90%. We compared two recognition engines, HTR+ and PyLaia. The HTR+ model trained without stress marks made less errors in letters, while PyLaia performed better with respect to diacritics.

    Subject(s): Manuscript // digital humanities //

Recognizing Handwritten Text in Slavic Manuscripts: a Neural-Network Approach Using Transkribus Scripta & e-Scripta vol. 19, 2019 floyd Tue, 10/08/2019 - 14:32 The paper discusses the automatic text recognition capabilities of neural network models specifically trained to recognize different styles of Church Slavonic handwriting within the software platform Transkribus. Computed character error rates of the models are in the range of 3 to 5 percent; real-life performance shows that specifically trained models, by and large, recognize simple (non-superscript) characters correctly most of the time. The error rate is higher with superscript letters, abbreviations, and word separation. Combined models consisting of training data from different sources are capable of transcribing different styles of Slavic handwriting with low error rates. Automatic text recognition using Transkribus and the models presented in this paper can help improve the efficiency of the process of digitizing Church Slavonic manuscripts and thus boost the number of digitized sources available in the future. Language and Literature Studies // Language studies // Studies of Literature // Philology // Theory of Literature // Foreign languages learning // Applied Linguistics // Computational linguistics // Translation Studies // CHURCH SLAVONIC // TRANSKRIBUS // AUTOMATIC TRANSCRIPTION // MACHINE LEARNING // NEURAL NETWORKS // ARTIFICIAL INTELLIGENCE //
Subscribe to TRANSKRIBUS