Tsvetana Dimitrova | Scripta & e-Scripta

Tsvetana Dimitrova, PhD, is an Assistant Professor at the Department of Computational Linguistics of the Institute for Bulgarian Language at the Bulgarian Academy of Sciences. Her research is focused on corpus linguistics, historical corpora, corpus annotation, diachronic syntax, lexical semantic networks.

Dimitrova, Tsvetana

cvetana@dcl.bas.bg

Assist. Prof., PhD Institute for Bulgarian Language, Bulgarian Academy of Sciences, Sofia, Bulgaria

Bulgaria

Scripta & e-Scripta vol. 25, 2025

Tsvetana Dimitrova On the Annotation of Category Change

Върху анотацията при промяната на категориите

Summary/Abstract

The article deals with a number of challenging points in the annotation of different categories (parts-of-speech, classes and subclasses) in the texts attesting the history of the Bulgarian language. The focus is on category changes which affect mainly the so-called function words (auxiliary verbs, coordinating conjunctions, subordinating conjunctions, pronouns, among others) in historical texts, with a goal of building a unified annotation scheme for the stages of the Bulgarian language. Annotation approaches followed by other projects will be discussed to arrive at practical solutions with a view of the theoretical assumptions and purposes of the data. Keywords:

Subject: Scripta

Keywords: corpus annotation history of the Bulgarian language category change

Scripta & e-Scripta vol. 21, 2021

Tsvetana Dimitrova On Sentence Segmentation in Diachronic Texts

Върху сегментирането на изреченията в диахронните текстове

Summary/Abstract

The article discusses a proposal of a minimal set of criteria for sentence segmentation (an obligatory stage in the corpus processing and annotation, especially with respect to the syntactic annotation) of medieval texts. In the context of a review of different definitions of a sentence (unit) and approaches to sentence segmentation, various criteria are discussed: structural, thematic, graphic, on the basis of sample sentences in order to define the minimal criteria. The discussion of the different factors is illustrated by sample sentences from two texts from 14th and 17th c. The proposed criteria aim at considering mainly structural characteristics while trying to avoid textual and semantic interpretation though these can also present challenges because the interpretation of the (syntactic) structure is inevitably related to the interpretation of the (semantic) content.

Subject: Digital humanities

Keywords: corpus annotation sentence sentence segmentation

Scripta & e-Scripta vol. 18, 2018

Tsvetana Dimitrova Andrej Boyadzhiev Electronic Edition and Linguistic Annotation of Slavic Fragments

Summary/Abstract

The paper introduces a project on edition and linguistic annotation of Medieval and Early Modern South Slavic manuscript fragments. The main topic is implementation of various approaches on integration of electronic edtion, manuscript description and linguistic annotation. A corpus will include fragments from parchment manuscripts kept in Bulgarian repositories. We will illustrate the approach with several pieces of texts from various fragments. The representation will be supplied with textual, as well as part-of-speech and basic syntactic annotation. On the basis of it an attempt will be made at experimental anaphora and related morpho-syntactic annotation. The work will offer a discussion on the features that will be useful for such annotation. The project relies on eXist database (http://exist-db.org) and the initiatives: Repertorium (http://repertorium.obdurodon.org/), PROIEL (http://www.hf.uio.no/ifikk/english/ research/ projects/proiel/) and TOROT (http://site.uit.no/slavhistcorp/files/2015/04/Eckhoff.pdf).

Subject: Language studies Language and Literature Studies Theoretical Linguistics Applied Linguistics Studies of Literature Computational linguistics South Slavic Languages Philology South Slavic manuscripts Fragments Linguistic annotation Linguistic corpora Electronic text edition Electronic description XML technologies

Scripta & e-Scripta vol 6, 2008

Tsvetana Dimitrova Andrej Boyadzhiev The Linguistic Information in the Electronic Corpus of Old Slavonic Texts

Summary/Abstract

Доклад посвящен включению лингвистических данных в электронном корпусе древних славянских текстов. Анализу подвергаются различные современные подходы в этом отношении. Приводятся сведения о форматах, использованных до сих пор в славянской научной традиции, а также делаются параллели с практикой включения языковых данных популярнейших проектов в области электронной обработки современных и древних языков. Подход авторов базирован на технологии языка описательной разметки XML (Extensible Markup Language) и все выводы сделаны на основании этого выбора.

Subject: Language and Literature Studies Electronic corpora Linguistic annotation Lexicographic information Morphlogical and syntactic analysis

Subscribe to Tsvetana Dimitrova