text-normalization
Here are 70 public repositories matching this topic...
Ultra-fast, zero-copy text normalization for Rust NLP pipelines & tokenizers
-
Updated
Feb 4, 2026 - Rust
Text Normalization on Learner Texts (South Tyrolean German as a L2)
-
Updated
Jan 30, 2026 - Python
🧹 Python package for text cleaning
-
Updated
Jan 28, 2026 - Python
Digestion Algorithm in Hierarchical Symbolic Forests: A Fast Text Normalization Algorithm and Semantic Parsing Framework for Specific Scenarios and Lightweight Deployment
-
Updated
Dec 23, 2025 - Python
A modern, actively maintained contractions library. Expands English contractions (you're → you are) with improved performance, type safety, and features like bulk dictionary imports and JSON loading. Includes 100% test coverage, full type hints, and works with both pip and uv.
-
Updated
Dec 15, 2025 - Python
Useful String extensions to save you time in production.
-
Updated
Dec 10, 2025 - Dart
Comprehensive benchmark of OpenAI Whisper models for Bosnian, Croatian, and Serbian languages. Includes pipelines for audio transcription, rigorous text normalization, Levenshtein distance evaluation, and LLM-based post-processing.
-
Updated
Dec 7, 2025 - HTML
Japanese text normalizer for mecab-neologd
-
Updated
Dec 2, 2025 - Cython
Unicode-safe text cleaning & typographic normalization for Rust
-
Updated
Oct 27, 2025 - Rust
Modern .NET 9 / C# 13 library to normalize text (emojis, currency, numbers, abbreviations, chat slang) for consistent and natural Text-to-Speech (TTS) synthesis, ideal for stream chat/donations.
-
Updated
Oct 24, 2025 - C#
Simple tool to check if Unicode text files are Unicode-normalized
-
Updated
Oct 18, 2025 - Python
Soe Vinorm: An Effective Text Normalization Toolkit for converting Vietnamese text to its spoken form.
-
Updated
Oct 17, 2025 - Python
Pipeline Python pour enrichir un dataset Arabe (MSA) → Darija (MA) depuis livres PDF & transcriptions YouTube ; normalisation, segmentation par tokens, génération (OpenAI ou règles) et export JSON. Projet de stage d’application chez YaneCode Digital.
-
Updated
Oct 13, 2025 - Python
A robust service that extracts and classifies financial amounts from medical bills and receipts using Tesseract.js OCR, normalization, and AI-powered context classification with Gemini API fallback.
-
Updated
Sep 27, 2025 - JavaScript
Demonstrating specialized NLP preprocessing packages (lightlemma, emoticon-fix, contraction-fix) through Twitter sentiment analysis. Achieves 97% accuracy with PyTorch LSTM + attention model.
-
Updated
Aug 9, 2025 - Jupyter Notebook
X.com Social Network Topic-Sentiment Analyzer - application that scrapes Twitter/X data, performs Polish-language text normalization and lemmatization, builds topic models with MALLET, and runs sentiment analysis for downstream analytics. Created as a project for Master Thesis.
-
Updated
Jul 30, 2025 - Java
Pipeline para finetuning automático de modelos de Text to Speech.
-
Updated
Jul 21, 2025 - Python
Nascanna idir Foclóir Uí Dhónaill agus DIL
-
Updated
Jul 1, 2025 - HTML
Ferramentas úteis para aplicações de Text To Speech: normalização de textos, construção automática de dataset e métricas de avaliação.
-
Updated
Jun 11, 2025 - Python
Improve this page
Add a description, image, and links to the text-normalization topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the text-normalization topic, visit your repo's landing page and select "manage topics."