On Sentence Segmentation in Diachronic Texts
Върху сегментирането на изреченията в диахронните текстове
- Author(s): Tsvetana Dimitrova
- Subject(s): Digital humanities //
Published by: Institute for Literature BAS
- Print ISSN: 1312-238X
The article discusses a proposal of a minimal set of criteria for sentence segmentation (an obligatory stage in the corpus processing and annotation, especially with respect to the syntactic annotation) of medieval texts. In the context of a review of different definitions of a sentence (unit) and approaches to sentence segmentation, various criteria are discussed: structural, thematic, graphic, on the basis of sample sentences in order to define the minimal criteria. The discussion of the different factors is illustrated by sample sentences from two texts from 14th and 17th c. The proposed criteria aim at considering mainly structural characteristics while trying to avoid textual and semantic interpretation though these can also present challenges because the interpretation of the (syntactic) structure is inevitably related to the interpretation of the (semantic) content.
Journal: Scripta & e-Scripta vol. 21, 2021
Page Range: 55-67
No. of Pages: 13
Language: EnglishYear: 2021
Issue No:: Scripta & e-Scripta vol. 21, 2021
- LINK CEEOL:
Tsvetana DimitrovaBulgariaAssist. Prof., PhD Institute for Bulgarian Language, Bulgarian Academy of Sciences, Sofia, BulgariaDescription
Tsvetana Dimitrova, PhD, is an Assistant Professor at the Department of Computational Linguistics of the Institute for Bulgarian Language at the Bulgarian Academy of Sciences. Her research is focused on corpus linguistics, historical corpora, corpus annotation, diachronic syntax, lexical semantic networks.
SUBJECT: Digital humanities //KEYWORDS: corpus annotation // sentence // sentence segmentation //