KreoLex Database

Comprehensive Lexical Resources for Creole Languages

Developed by Fabiola Henri • University at Buffalo, SUNY • Department of Romance Languages & Literatures

About KreoLex

KreoLex is a comprehensive database providing lexical information for various creole languages. The database includes information on word frequencies from collected written and spoken corpora, etymology, phonological representation, associated lemmas, grammatical categories, and other relevant linguistic annotations.

Collected raw corpora are semi-automatically tagged based on KreoLex, making it an invaluable resource for linguistic research, computational linguistics, and language documentation. The project has been developed and maintained since 2020 with ongoing updates and expansions.

Available Databases

Mauritian Creole Lexicon

Language: Mauritian Kreol

Coverage: Complete verbal and nominal lexicon

Time Period: Contemporary usage

Last Updated: 2019

4,500+
Lemmas
15,000+
Token Count
6
Annotation Layers
CSV Excel TXT

Guadeloupean Creole Verb Lexicon

Language: Guadeloupean Kréyòl

Coverage: Complete verbal lexicon

Time Period: Contemporary usage

Last Updated: 2019

2,800+
Verb Lemmas
8
Tense/Aspect Forms
5
Annotation Layers
CSV Excel JSON

Haitian Creole Verb Lexicon

Language: Haitian Kreyòl

Coverage: Comprehensive verbal lexicon

Collaborator: Herby Glaude

Last Updated: 2019

3,200+
Verb Lemmas
12M+
Corpus Tokens
7
Annotation Layers
CSV Excel XML

Louisiana Creole Database

Language: Louisiana Creole

Coverage: Multi-modal lexical database

Status: In Development (2020-2025)

Features: Audio samples included

Ongoing
Data Collection
4
Dialect Regions
8+
Data Types
CSV Excel PDF Docs

Mauritian Creole Full Collection

Complete set of lexical resources for Mauritian Kreol, including specialized databases for different grammatical categories.

Mauritian Noun Lexicon

Complete noun database with 4,500+ lemmas, frequency data, and etymological information.

Download

Mauritian Verb Lexicon

Comprehensive verb database with tense/aspect marking and syntactic information.

Download XLSX

Mauritian Adjectives

Adjective database with comparative/superlative forms and semantic categorization.

Download XLSX

Mauritian Prepositions

Preposition database with syntactic distributions and semantic roles.

Download XLSX

Indian Ocean Creoles Collection

Lexical resources for various French-based creoles of the Indian Ocean region.

Rodriguais & Chagossien Lexicon

Lexical database for Rodriguais and Chagossien creoles with comparative data.

Download XLSX

Réunionnais Creole Lexicon

Comprehensive lexical database for Réunionnais Creole with dialectal variations.

Download XLSX

Database Structure & Features

Sample Data Structure (Mauritian Creole Noun Lexicon)

Lemma POS Frequency Phonetic Etymology Semantic Field
lazwa Noun 245 /lazwa/ French "la joie" Emotion
lakaz Noun 1,892 /lakaz/ French "la case" Habitation
dimoun Noun 3,456 /dimun/ French "du monde" People
travay Noun/Verb 2,134 /travaj/ French "travail" Activity
leritaz Noun 567 /lerita/ French "héritage" Culture

How to Cite KreoLex

Henri, Fabiola. (2025). KreoLex: A comprehensive lexical database for creole languages. University at Buffalo.

Data Access & Usage

Access Conditions

  • Open Access: All datasets are freely available for academic research
  • Attribution Required: Please cite the dataset using the provided citation
  • Commercial Use: Contact for permission for commercial applications
  • Updates: Databases are periodically updated with new data
  • Funding Acknowledgement: Supported by NSF and other research grants

Grant Support

KreoLex development has been supported by:

  • NSF IRES: Documenting and Analyzing Creole languages (2025-2030)
  • NSF DLI-DEL: Advancing linguistic research through speech technologies (2025-2027)
  • LABEX EFL—Towards a TreeBank for Mauritian (2011-2021)
  • Various University at Buffalo research grants