Kurdyjski corpora
Scholarly databases that document Kurdyjski and its dialects. Each card opens the corpus in a new tab.
- Corpus
UD Northern Kurdish — Kurmanji Treebank
Kurmanji dependency treebank built from a 1944 Sherlock Holmes translation and Wikipedia sentences (Gökırmak & Tyers; Universal Dependencies).
Dialects- Kurmandżi
- Corpus
AsoSoft Text Corpus
First large-scale Central Kurdish (Sorani) text corpus, 75M-token large version + 5M-token small version, with topic annotations (AsoSoft, published in Digital Scholarship in the Humanities).
Dialects- Sorani
- Corpus
Pewan — Kurdish IR Test Collection
News-crawled corpus of 115k Sorani and 25k Kurmanji articles (2003–2012) plus IR query set and relevance judgements (Esmaili et al.; hosted by Sina Ahmadi).
Dialects- Sorani
- Kurmandżi
- Corpus
Zaza-Gorani Corpus
News-text corpus covering Zazaki (4,855 articles / 1.6M tokens) and Gorani including Hawrami (428 articles / 195k tokens) (Sina Ahmadi).
Dialects- Zazaki
- Hawrami
- Dictionary
Living Dictionary — Central Kurdish (Sorani)
Community-built dictionary of Central Kurdish (Sorani), the variety spoken in Iraqi Kurdistan and western Iran (Living Tongues Institute for Endangered Languages).
Dialects- Sorani