Recognizing Handwritten Text in Slavic Manuscripts: a Neural-Network Approach Using Transkribus

Author(s): Achim Rabus
Subject(s): Language and Literature Studies // Language studies // Studies of Literature // Philology // Theory of Literature // Foreign languages learning // Applied Linguistics // Computational linguistics // Translation Studies //
Published by: Institute for Literature BAS
Print ISSN: 1312-238X
Summary/Abstract:
The paper discusses the automatic text recognition capabilities of neural network models specifically trained to recognize different styles of Church Slavonic handwriting within the software platform Transkribus. Computed character error rates of the models are in the range of 3 to 5 percent; real-life performance shows that specifically trained models, by and large, recognize simple (non-superscript) characters correctly most of the time. The error rate is higher with superscript letters, abbreviations, and word separation. Combined models consisting of training data from different sources are capable of transcribing different styles of Slavic handwriting with low error rates. Automatic text recognition using Transkribus and the models presented in this paper can help improve the efficiency of the process of digitizing Church Slavonic manuscripts and thus boost the number of digitized sources available in the future.

Journal: Scripta & e-Scripta vol. 19, 2019

Page Range: 9-32
No. of Pages: 24
Language: English

Year: 2019
Issue No:: Scripta & e-Scripta vol. 19, 2019

Submitted on: 8 October 2019
LINK CEEOL: https://www.ceeol.com/search/article-detail?id=793865
Achim Rabus

Germany

achim.rabus@slavistik.uni-freiburg.de

Department of Slavic Linguistics, University of Freiburg, Germany

Description

Prof. Dr. Achim Rabus is the current Head of the Department of Slavonic Studies at the University of Freiburg, Germany. Rabus defended his PhD thesis on the language of East Slavic spiritual songs in 2008 and his Habilitationsschrift on Slavic language contact in 2014. Since 2009, Rabus has been a member of the Special Commission on the Computer- Supported Processing of Mediæval Slavonic Manuscripts and Early Printed Books to the International Committee of Slavists, and since 2018, the President of the Commission. His current research focuses on Slavic social dialectology, Handwritten Text Recognition, corpus and (digital) historical linguistics.
SUBJECT: Language and Literature Studies // Language studies // Studies of Literature // Philology // Theory of Literature // Foreign languages learning // Applied Linguistics // Computational linguistics // Translation Studies //

KEYWORDS: CHURCH SLAVONIC // TRANSKRIBUS // AUTOMATIC TRANSCRIPTION // MACHINE LEARNING // NEURAL NETWORKS // ARTIFICIAL INTELLIGENCE //