Skip to content
#

text-normalization

Here are 70 public repositories matching this topic...

A modern, actively maintained contractions library. Expands English contractions (you're → you are) with improved performance, type safety, and features like bulk dictionary imports and JSON loading. Includes 100% test coverage, full type hints, and works with both pip and uv.

  • Updated Dec 15, 2025
  • Python

Comprehensive benchmark of OpenAI Whisper models for Bosnian, Croatian, and Serbian languages. Includes pipelines for audio transcription, rigorous text normalization, Levenshtein distance evaluation, and LLM-based post-processing.

  • Updated Dec 7, 2025
  • HTML

Pipeline Python pour enrichir un dataset Arabe (MSA) → Darija (MA) depuis livres PDF & transcriptions YouTube ; normalisation, segmentation par tokens, génération (OpenAI ou règles) et export JSON. Projet de stage d’application chez YaneCode Digital.

  • Updated Oct 13, 2025
  • Python

Demonstrating specialized NLP preprocessing packages (lightlemma, emoticon-fix, contraction-fix) through Twitter sentiment analysis. Achieves 97% accuracy with PyTorch LSTM + attention model.

  • Updated Aug 9, 2025
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the text-normalization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the text-normalization topic, visit your repo's landing page and select "manage topics."

Learn more