[ACL'24] MC^2: A Multilingual Corpus of Minority Languages in China (Tibetan, Uyghur, Kazakh, and Mongolian)
-
Updated
Jan 17, 2026 - Python
[ACL'24] MC^2: A Multilingual Corpus of Minority Languages in China (Tibetan, Uyghur, Kazakh, and Mongolian)
NLP Toolkit for Turkic Languages
[ACL'25 Findings] MiLiC-Eval: Benchmarking Multilingual LLMs for China's Minority Languages
Open-source Automatic Speech Recognition (ASR) pipeline for Bashkir (Bashkort), Kazakh, and Kyrgyz languages with deterministic orthography correction.
Shyraq Translator Bot is a convenient tool designed for seamless conversion between Cyrillic and Shyraq (Latin) scripts. 📖: t.me/Shyraq_Tech
End-to-end Kazakh speech recognition and grammar correction pipeline powered by Whisper and GPT-4.
OCR for Cyrillic official documents (Kazakh & Russian) with name and date extraction.
The open Kazakh-language NLP stack — translation, GPT-2/RWKV training, fine-tuning, summarization, tokenization, and dataset prep.
Приложение составляет глоссарий из случайных терминов по теме "Радиотехника, электроника и телекоммуникации" (РЭТ)
Minimal RNN/GPT-2 implementation for Kazakh text generation
High-quality Text-to-Speech for Turkic languages (Kazakh) with multi-speaker and emotion control.
🇰🇿 The world's first programming language with native Kazakh syntax. Write code in Kazakh, Russian, and more — breaking the English-only barrier in programming.
Small Language Models for Kazakh — from 14M to 600M parameters, trained from scratch.
Automated video dubbing system from English to Kazakh with speech synchronization
Minimal recipe to train a 50M Kazakh language model from scratch — tokenizer, data, training, inference
🔢 Convert numbers to words and words to numbers easily with this efficient Python tool, supporting a wide range of numerical formats.
Add a description, image, and links to the kazakh topic page so that developers can more easily learn about it.
To associate your repository with the kazakh topic, visit your repo's landing page and select "manage topics."