part-of- speech tagging

Wege zur verbesserten automatischen Annotation des mittelbulgarischen Kirchenslawischen

Фабио Майо. Начини за подобряване на автоматичните анотации на средно­ български църковнославянски текстове

  • Summary/Abstract

    The last decade has brought an upswing in research on natural language processing. However, it is well known that historical language stages are largely underrepresented. Middle Bulgarian Church Slavonic, a language variety with a significant literary productivity, is a prime example. In the current paper, it is shown how annotated texts of related language varieties can be used to annotate texts written in Middle Bulgarian Church Slavonic, such as the 14th-century translation of the Dioptra. In particular, I present a way of adapting the available training data and of reducing the differences between training and test data, thereby improving the result of the automatic morphological annotation. Moreover, it is demonstrated that a comparison with the original work, written in Byzantine Greek, can further increase the results of the annotation by carefully disambiguating homonymous word forms. The presented results can benefit research on Middle Bulgarian Church Slavonic as it shows how texts in this variety can be annotated without authentic training data. The proposed method may be of use not only for Slavonic Studies, however. The method of using training data from genetically related language varieties in combination with translations may be used to annotate other underrepresented language varieties as well.


Subscribe to part-of- speech tagging