Skip to content
#

text-normalization

Here are 31 public repositories matching this topic...

Text preprocessing and PII anonymisation for NLP/ML. ONNX NER ensemble, language detection, stopword removal. Built for statistical ML and language models.

  • Updated Feb 28, 2026
  • Python

A modern, actively maintained contractions library. Expands English contractions (you're → you are) with improved performance, type safety, and features like bulk dictionary imports and JSON loading. Includes 100% test coverage, full type hints, and works with both pip and uv.

  • Updated Dec 15, 2025
  • Python

Pipeline Python pour enrichir un dataset Arabe (MSA) → Darija (MA) depuis livres PDF & transcriptions YouTube ; normalisation, segmentation par tokens, génération (OpenAI ou règles) et export JSON. Projet de stage d’application chez YaneCode Digital.

  • Updated Oct 13, 2025
  • Python

Clipboard Translator is a lightweight desktop application built with PyQt5 that automatically translates text copied to the clipboard into Persian using the Google Translate API. The application features a modern and minimalistic UI, custom styling, and real-time text normalization and tokenization.

  • Updated Jul 31, 2024
  • Python

Improve this page

Add a description, image, and links to the text-normalization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the text-normalization topic, visit your repo's landing page and select "manage topics."

Learn more