Skip to content
#

text-normalization

Here are 73 public repositories matching this topic...

Demonstrating specialized NLP preprocessing packages (lightlemma, emoticon-fix, contraction-fix) through Twitter sentiment analysis. Achieves 97% accuracy with PyTorch LSTM + attention model.

  • Updated Aug 9, 2025
  • Jupyter Notebook

Extract text content from an HTML page, process it, and extract unique words from the processed text. This notebook utilizes various text processing techniques including cleaning, normalization, tokenization, lemmatization or stemming, and stop words removal.

  • Updated Apr 5, 2024
  • Jupyter Notebook

Clipboard Translator is a lightweight desktop application built with PyQt5 that automatically translates text copied to the clipboard into Persian using the Google Translate API. The application features a modern and minimalistic UI, custom styling, and real-time text normalization and tokenization.

  • Updated Jul 31, 2024
  • Python

Text classification is a widely used natural language processing task in different business problems. Given a statement or document, the task involves assigning to it an appropriate category from a pre-defined set of categories. The dataset of choice determines the set of categories. Text classification has applications in emotion classification, n

  • Updated Nov 2, 2022
  • Jupyter Notebook

Pipeline Python pour enrichir un dataset Arabe (MSA) → Darija (MA) depuis livres PDF & transcriptions YouTube ; normalisation, segmentation par tokens, génération (OpenAI ou règles) et export JSON. Projet de stage d’application chez YaneCode Digital.

  • Updated Oct 13, 2025
  • Python

Comprehensive benchmark of OpenAI Whisper models for Bosnian, Croatian, and Serbian languages. Includes pipelines for audio transcription, rigorous text normalization, Levenshtein distance evaluation, and LLM-based post-processing.

  • Updated Dec 7, 2025
  • HTML

Improve this page

Add a description, image, and links to the text-normalization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the text-normalization topic, visit your repo's landing page and select "manage topics."

Learn more