Buryat corpus

This website presents the Buryat corpus of written texts. The total size is 3 million words. Currently, 85% of the total size of the corpus is annotated. Each annotated word form has a lemma and a set of grammatical tags.

This corpus is a linguistic resource of the contemporary Buryat literary language. Its organization and development has been carried out with the financial support of the Program of Fundamental Research of the Presidium of the Russian Academy of Sciences. The Buryat corpus is addressed to all who are interested in the language and culture of the Buryats: Mongolian studies specialists, linguists, scholars, educators, as well as writers, journalists, and librarians.

The corpus includes about 3 million words of written texts (mainly in works of Buryat fiction), along with their meta-descriptions (i.e. author’s name and gender, date of publishing, genre, chapters and other text divisions, etc.). The corpus is supplied with the words' morphological information, i.e. inflected forms and grammatical category.

The Buryat corpus development is a long-term project: the corpus is being continually augmented and modified as new texts are added to it; the project’s grammatical dictionary and linguistic tools such as the morphological analyzer and meta-description are being constantly enhanced.

The participants of the project are or were previously affiliated of the institutions as National Research University Higher School of Economics (M. Daniel, T. Arkhangelskiy), Institute of Oriental Studies, Russian Academy of Sciences (S. A. Krylov), and Institute for Mongolian, Buddhist and Tibetan Studies, Siberian Division, Russian Academy of Sciences (L. D. Badmaeva, O. S. Rinchinov, G. N. Chimitdorzhieva, Ju. D. Abaeva).