🤗 Data available on HuggingFace
🤗 Download our models on HuggingFace:
- Our flagship model (mBART fine-tuned on all data)
- mBART fine-tuned on public data
- Model trained from scratch on all data
- Model trained from scratch on public data
Goal: include data for each of the following languages
African-diaspora Creole languages of the Americas 🌎
- Saint Lucian (
acf) - Bahamian Creole (
bah) - Berbice Dutch (
brc) - Belizean (
bzj) - Miskito Coast Creole (
bzk) - Garifuna (
cab) - Negerhollands (
dcr) - Ndyuka (
djk) - Guadeloupean (
gcf) - French Guianese (
gcr) - Gullah (
gul) - Creolese (
gyn) - Haitian (
hat) - San Andrés-Provencia (
icr) - Jamaican (
jam) - Louisiana (
lou) - Martinican (
mart1259) - Media Lengua (
mue) - Papiamento (
pap) - Saramaccan (
srm) - Sranan Tongo (
srn) - Vincentian Creole (
svc) - Trinidadian Creole (
trf)
Creole languages of Africa 🌍
- Angolar (
aoa) - Saotomense (
cri) - Seychellois (
crs) - Annobonese (
fab) - Fanakalo (
fng) - Pichi (
fpe) - Ghanaian Pidgin (
gpe) - Kabuverdianu (
kea) - Krio (
kri) - Kituba (
ktu) - Mauritian (
mfe) - Naija (
pcm) - Guinea-Bissau Creole (
pov) - Principense (
pre) - Réunion Creole (
rcf) - Sango (
sag) - Cameroonian Pidgin (
wes)
Other Creole languages 🌏
- Tok Pisin (
tpi)
...with target languages 🎯
- English (
eng) - French (
fra) - Arabic (
ara) - Azerbaijani (
aze) - Cebuano (
ceb) - German (
deu) - Haitian (
hat) - Nepali (
nep) - Portuguese (
por) - Spanish (
spa) - Chinese (
zho)
Using ISO 639-3 codes
Please cite:
@article{robinson2024krey,
title={Krey$\backslash$ol-MT: Building MT for Latin American, Caribbean and Colonial African Creole Languages},
author={Robinson, Nathaniel R and Dabre, Raj and Shurtz, Ammon and Dent, Rasul and Onesi, Onenamiyi and Monroc, Claire Bizon and Grobol, Lo{\"\i}c and Muhammad, Hasan and Garg, Ashi and Etori, Naome A and others},
journal={arXiv preprint arXiv:2405.05376},
year={2024}
}