Digestion Algorithm in Hierarchical Symbolic Forests: A Fast Text Normalization Algorithm and Semantic Parsing Framework for Specific Scenarios and Lightweight Deployment
-
Updated
Mar 23, 2026 - Python
Digestion Algorithm in Hierarchical Symbolic Forests: A Fast Text Normalization Algorithm and Semantic Parsing Framework for Specific Scenarios and Lightweight Deployment
Text preprocessing and PII anonymisation for NLP/ML. ONNX NER ensemble, language detection, stopword removal. Built for statistical ML and language models.
A simple tool to check if Unicode text files are Unicode-normalized
Text Normalization on Learner Texts (South Tyrolean German as a L2)
🧹 Python package for text cleaning
A modern, actively maintained contractions library. Expands English contractions (you're → you are) with improved performance, type safety, and features like bulk dictionary imports and JSON loading. Includes 100% test coverage, full type hints, and works with both pip and uv.
Soe Vinorm: An Effective Text Normalization Toolkit for converting Vietnamese text to its spoken form.
Pipeline Python pour enrichir un dataset Arabe (MSA) → Darija (MA) depuis livres PDF & transcriptions YouTube ; normalisation, segmentation par tokens, génération (OpenAI ou règles) et export JSON. Projet de stage d’application chez YaneCode Digital.
Pipeline para finetuning automático de modelos de Text to Speech.
Ferramentas úteis para aplicações de Text To Speech: normalização de textos, construção automática de dataset e métricas de avaliação.
Training Tacotron 2 Text-to-Speech (TTS)
A Python library for text normalization, specifically designed for Vietnamese and English text processing. This library provides comprehensive text normalization capabilities including handling of special characters, numbers, dates, and various text formats.
Cryptocurrency Market Analysis and Question Answering System
An online text normalization tool for Chinese-English mixed text-to-speech system
Clipboard Translator is a lightweight desktop application built with PyQt5 that automatically translates text copied to the clipboard into Persian using the Google Translate API. The application features a modern and minimalistic UI, custom styling, and real-time text normalization and tokenization.
📢 Tha (ថា) - A Khmer Text Normalization and Verbalization Toolkit
Code, models, and data for "Exploiting Dialect Identification in Automatic Dialectal Text Normalization". ArabicNLP 2024, ACL.
This python module is an easy-to-use port of the text normalization used in the paper "Not low-resource anymore: Aligner ensembling, batch filtering, and new datasets for Bengali-English machine translation". It is intended to be used for normalizing / cleaning Bengali and English text.
Command-line interface (CLI) and library to normalize English texts.
Add a description, image, and links to the text-normalization topic page so that developers can more easily learn about it.
To associate your repository with the text-normalization topic, visit your repo's landing page and select "manage topics."