Administrative documents of the Don Cossack Host in the 18th – 19th centuries: the issue of the creation of a linguistic corpus
Административните документи на Донската казашка армия от XVIII–XIX век: проблемът за изграждане на лингвистичен корпус
- Author(s): Oksana Gorban Marina Kosova Elena Sheptukhina Andrey Svetlov Anatoly Komendantov Alexander Matveev Daniil Filimonov
- Subject(s): Digital humanities //
-
Published by: Institute for Literature BAS
- Print ISSN: 1312-238X
- Summary/Abstract:
The article presents basic principles of designing the diachronic linguistic corpus of documents of the Don Cossack Host offices from the State Archive of the Volgograd region, Russia, including collecting documents for the text corpus, arranging the technical base of automatic processing and text editing, scheduling automated tagging, morphological annotation, and corpus software tools. The authors explain some technical aspects of corpus processing and text corpus constituency. It is considered reasonable to add any document to the corpus, the draft texts with the crossed-out fragments included, as it ensures accurate registration of grammar and vocabulary of the language at a certain historical period. A set of language marker types is worked over for automated meta-tagging. The corpus software tools are defined to enable accurate annotation of obsolete fonts so that they can be processed in a pair with regular language units and expressions in morphological and genre meta-tagging; in cases of partial text adaptation, the authentic old graphic symbols may have to be preserved.
Journal: Scripta & e-Scripta vol. 21, 2021
-
Page Range: 139-150
No. of Pages: 12
Language: English - LINK CEEOL:
-
Oksana GorbanRussiaRussian Philology and Journalism, Volgograd State UniversityDescription
Oksana Gorban – Doctor of Sciences (Philology), Professor of Depart ment of Russian Philology and Journalism, Volgograd State University, history of the Russian language;
Marina KosovaRussiaRussian Philology and Journalism, Volgograd State UniversityDescriptionMarina Kosova – Doctor of Sciences (Philology), Professor of Department of Russian Philology and Journalism, Volgograd State University, Russian language, documentation studies; e-mail:
Elena SheptukhinaRussiaRussian Philology and Journalism, Volgograd State UniversityDescriptionElena Sheptukhina – Doctor of Sciences (Philology), Professor of Depart ment of Russian Philology and Journalism, Volgograd State University, history of the Russian language;
Andrey SvetlovRussiaVolgograd State UniversityDescriptionAndrey Svetlov – Candidate of Physical and Mathematical Sciences, Associate Professor, Department of Mathematical Analysis and Function Theory, Volgograd State University, mathematical modelling, data mi ning;
Anatoly KomendantovRussiaVolgograd State UniversityDescriptionAnatoly Komendantov – Student, Institute of Mathematics and IT, Volgograd State University, software development;
Alexander MatveevRussiaVolgo grad State UniversityDescriptionAlexander Matveev – Student, Institute of Mathematics and IT, Volgo grad State University, software development;
Daniil FilimonovRussiaVolgograd State UniversityDescriptionDaniil Filimonov – Student, Institute of Mathematics and IT, Volgograd State University, software development;
-
SUBJECT: Digital humanities //KEYWORDS: diachronic linguistic corpus // administrative documents // Don Cossack Host // meta-tagging // morphological tags. //
-