CHURCH SLAVONIC

Digital Edition of Pop Punčov Sbornik: Project Note

  • Summary/Abstract

    The described resource is an online tool, designed for studying texts and diachronic variation of language. The core of its corpus is represented by the Pop Punčov Sbornik, a West Bulgarian manuscript from 1796, released together with smaller examples of 14th–19th century Balkan Slavic varieties. Aside from the data, providing a unique view of historical dialects, it also provides a user- friendly interface and modular structure, thus allowing both easy additions of new content and features, as well as training of students and lay people interested in historical literature. The resource also contains extensive documentation concerning both grammar and philological data about the sources.

    Subject: e-Scripta

Deep Mining of the Collection of Old Prints ‘Kirchenslavica digital’

Цифровизиране с извличане на семантични данни на сбирката от старопечатни книги Kirchenslavica digital

  • Summary/Abstract
    The article deals with various efforts of the Staatsbibliothek zu Berlin (SBB) to make its collection of about 250 Church-Slavic prints from the 17th to the 19th century accessible in terms of content using the methods of modern information technology from the Digital Humanities sector. The focus is on full-text indexing of the heterogeneous Church Slavonic prints using HTR+ language models from the programme Transkribus. Depending on whether they are Moscow, Kiev or Old Believer prints, these models require different approaches and corresponding adaptations that take into account the printing area and printing period. Prints such as Kirillova kniga (1644) or Gistorija Ioanna Damaskina (1637) and many others are processed at large scale, whereby the developed character recognition models are constantly refined by training new verified data. The full texts generated in this way are permanently stored in various XML formats (ALTO, PAGE) on the one hand in a central repository for subsequent use, and on the other hand they are merged with original digital copies in the IIIF-compatible Digital Library of the SBB. As a further element, the Church Slavonic full texts will be indexed using special SOLR analyzers for efficient searches (Tokinising, Translit, N-Grams) and made searchable in subject portals (including the Slavistik-Portal) using modern text-image web design.

Recognizing Handwritten Text in Slavic Manuscripts: a Neural-Network Approach Using Transkribus Scripta & e-Scripta vol. 19, 2019 floyd Tue, 10/08/2019 - 14:32 The paper discusses the automatic text recognition capabilities of neural network models specifically trained to recognize different styles of Church Slavonic handwriting within the software platform Transkribus. Computed character error rates of the models are in the range of 3 to 5 percent; real-life performance shows that specifically trained models, by and large, recognize simple (non-superscript) characters correctly most of the time. The error rate is higher with superscript letters, abbreviations, and word separation. Combined models consisting of training data from different sources are capable of transcribing different styles of Slavic handwriting with low error rates. Automatic text recognition using Transkribus and the models presented in this paper can help improve the efficiency of the process of digitizing Church Slavonic manuscripts and thus boost the number of digitized sources available in the future. Subject: Language and Literature Studies Language studies Studies of Literature Philology Theory of Literature Foreign languages learning Applied Linguistics Computational linguistics Translation Studies Keywords: CHURCH SLAVONIC TRANSKRIBUS AUTOMATIC TRANSCRIPTION MACHINE LEARNING NEURAL NETWORKS ARTIFICIAL INTELLIGENCE
Subscribe to CHURCH SLAVONIC