On the Principles of a Digital Text Corpus: New Opportunities in Working on Heroic Epics of the Shors


This essay discusses the main principles of a Digital Text Corpus initiated in 2011 with support from the Department of Northern and Siberian Studies at the Institute of Ethnology and Anthropology of the Russian Academy of Sciences. With special focus on the vast Shor (a Turkic people in the south of Western Siberia) materials, the essay showcases how this Corpus offers unique and varied means for analyzing folklore texts in lesser used, mostly endangered Siberian languages.



Main territories now occupied by the Shors.

Image: Funk and Tomilov 2006:247.

Proportions of the numbers of the Shor epic texts recorded by scholars between 1861-2006.

Graph: by the author.

Volume of the Shor Corpus (with a list of the 20 most frequent word-forms).

Image: http://corpora.iea.ras.ru/corpora/statistics.php.

The structure of the Shor Corpus.

Image: http://corpora.iea.ras.ru/corpora/structure.php.

Vladimir Tannagashev in his apartment (in the kitchen) in the town of Myski, Kemerovo region, 2003.

Photo: D. Funk.

List of epic texts from the Tannagashev’s repertoire in the Shor Corpus.

Image: http://corpora.iea.ras.ru/corpora/texts.php?performed_by=17.

Torbokov - taken by an anonymous photographer on June 15, 1969.

Photo: Folklore Archive of the State Literature Museum.

An original version of the epos Ak-Pilek given in parallel with the standardized (normalized) one.

Image: http://corpora.iea.ras.ru/corpora/describe_text.php?id=19.

A scanned page from Tannagashev’s self-recording of the epos Ak-Pilek.

See also http://corpora.iea.ras.ru/corpora/pages.php?id_text=19&page=2#image.

A list of word-forms from the epos Ak-Pilek with examples.

Image: http://corpora.iea.ras.ru/corpora/describe_text.php?id=19.

Information about the word-form pagda (“on a leash”) (with graphic representation and contextual examples).

Image: http://corpora.iea.ras.ru/corpora/describe_word.php?lang_code=cjs&wf_kind=normalised&word=%D0%BF%D0%B0%D2%93%D0%B4%D0%B0.

Recurring expressions in the eposes Ak-Pilek and Altyn-Torgu (with the level of similarity of 0.900 and higher).

Image: http://corpora.iea.ras.ru/corpora/compare_texts.php.

A short excerpt of the epos Chylan-Toochii, Vladimir Tannagashev.”

Recording: D. Funk, 2003.

Table of Contents

mobile close