Deep Mining of the Collection of Old Prints ‘Kirchenslavica digital’
Цифровизиране с извличане на семантични данни на сбирката от старопечатни книги Kirchenslavica digital
- Author(s): Vladimir Neumann
- Subject(s): Digital humanities //
-
Published by: Institute for Literature BAS
- Print ISSN: 1312-238X
- Summary/Abstract:
The article deals with various efforts of the Staatsbibliothek zu Berlin (SBB) to make its collection of about 250 Church-Slavic prints from the 17th to the 19th century accessible in terms of content using the methods of modern information technology from the Digital Humanities sector. The focus is on full-text indexing of the heterogeneous Church Slavonic prints using HTR+ language models from the programme Transkribus. Depending on whether they are Moscow, Kiev or Old Believer prints, these models require different approaches and corresponding adaptations that take into account the printing area and printing period. Prints such as Kirillova kniga (1644) or Gistorija Ioanna Damaskina (1637) and many others are processed at large scale, whereby the developed character recognition models are constantly refined by training new verified data. The full texts generated in this way are permanently stored in various XML formats (ALTO, PAGE) on the one hand in a central repository for subsequent use, and on the other hand they are merged with original digital copies in the IIIF-compatible Digital Library of the SBB. As a further element, the Church Slavonic full texts will be indexed using special SOLR analyzers for efficient searches (Tokinising, Translit, N-Grams) and made searchable in subject portals (including the Slavistik-Portal) using modern text-image web design.
Journal: Scripta & e-Scripta vol. 21, 2021
-
Page Range: 207-216
No. of Pages: 10
Language: English - LINK CEEOL:
-
Vladimir NeumannGermanyStaatsbibliothek zu BerlinDescription
Dr. Vladimir Neumann is the subject specialist for Slavic studies with a focus on Belarus and Poland and has been working at the Staatsbibliothek zu Berlin since 2005. There he has been developing and maintaining the Slavistik-Portal for several years - one of the first addresses for Slavic bibliographic information in German-speaking countries. His focus is on 17th century Church Slavonic in its contact linguistic forms in the East and West Slavic regions (especially on the Polish-Ruthenian-Russian- Church Slavonic line). He is dedicated to the care and research of the Church Slavonic holdings at the Staatsbibliothek zu Berlin.
-
SUBJECT: Digital humanities //KEYWORDS: CHURCH SLAVONIC // Old prints // Transkribus; automatic transcription // model training // data processing and retrieval //
-