Serbian Early Printed Books from Venice: Creating Models for Automatic Text Recognition Using Transkribus

Владимир Р. Поломац. Сръбски старопечатни книги от Венеция: cъздаване на модели за автоматично текстово разпознаване чрез Transkribus

scripta_22_cover.jpg
  • Author(s):
  • Subject(s): e-Scripta // Digital humanities //
  • Published by: Institute for Literature BAS
  • Print ISSN: 1312-238X
  • Summary/Abstract:

    The paper describes the process of creating a model for the automatic rec- ognition of Serbian Church Slavonic printed books from Venice (from Božidar and Vincenzo Vuković’s printery) by using the Transkribus software platform, based on the principles of artificial intelligence and machine learning. By using the example of Prayer Book (Euchologion) (1538–1540) from Božidar Vuković’s printery, it has been shown that a successful model for the automatic recognition of individual books (with around 5% of unrecognized characters) can also be trained on the material consisting of approximately 4000 words, and that the increased amount of training material (in our case around 38000 words) leads to the improvement of the model and reduced error rate (between 1–2% of unrecognized characters). The most notable result of the paper is manifested through the creation of a generic model for the automatic text recognition of Serbian Church Slavonic books from Božidar and Vincenzo Vuković’s printery. The ini- tial version of the generic model (called Dionisio 1.0. by the Božidar Vuković’s Italian pseudonym – Dionisio della Vecchia) is the first resource for the automatic recognition of the Serbian medieval Cyrillic script, publicly available to all users of the Transkribus software platform (see https://readcoop.eu/model/dionisio-1-0/).


  • Page Range: 11-29
    No. of Pages: 19
    Language: English
    Year: 2022
    Issue No:: Scripta & e-Scripta vol. 22, 2022

    Submitted on:

  • LINK CEEOL:
  • Vladimir Polomac

    University of Kragujevac
    Description

    Vladimir Polomac is an associate professor at the Department of the Serbian Language at the Faculty of Philology and Arts, University of Kragujevac (Serbia). For the monograph entitled Језик повеља и писама Српске деспотовине [The Language of Chaters and Letters of Serbian Despotate] he received the “Pavle and Milka Ivić” award by the Serbian Slavic Association for the best book in the field of linguistic Slavic studies in Serbia in 2016. His current scientific interests include historical (corpus) linguistics (historical dialectology and onomastics of the Serbian language), philological and linguistic research of Serbian medieval literacy especially. He has been a member of the Onomastics Committee of the Serbian Academy of Sciences and Arts since 2015.