Árabe corpora
Scholarly databases that document Árabe and its dialects. Each card opens the corpus in a new tab.
- Corpus
UD Arabic PADT Treebank
Open release of the Prague Arabic Dependency Treebank — newswire Modern Standard Arabic, dependency-annotated (Charles University Prague / Universal Dependencies).
- Corpus
Tashkeela — Arabic Diacritization Corpus
75-million-word vocalized Arabic corpus from classical and modern sources, released for diacritization research (Zerrouki & Balla, Data in Brief 2017).
- Corpus
arabiCorpus
Searchable Arabic corpus with newspaper, modern literature, nonfiction, premodern, and Egyptian Colloquial sub-corpora (Brigham Young University).
Dialects- Árabe egipcio
- Corpus
Tunisiya — Tunisian Arabic Corpus
One-million-word written corpus of Tunisian Arabic across folktales, blogs, newspapers, TV, and literature (Karen McNeil & Miled Faiza, Georgetown).
Dialects- Árabe magrebí
- Dictionary
Living Dictionary — Levantine Arabic
Community-built dictionary of Levantine Arabic; specific sub-variety (Syrian/Palestinian/Lebanese/Jordanian) is not declared on the project page (Living Tongues Institute for Endangered Languages).
Dialects- Árabe levantino
- Dictionary
Living Dictionary — Moroccan Arabic (Darija)
Community-built dictionary of Moroccan Arabic (Darija) (Living Tongues Institute for Endangered Languages).
Dialects- Árabe magrebí