You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A robust service that extracts and classifies financial amounts from medical bills and receipts using Tesseract.js OCR, normalization, and AI-powered context classification with Gemini API fallback.
Extract text content from an HTML page, process it, and extract unique words from the processed text. This notebook utilizes various text processing techniques including cleaning, normalization, tokenization, lemmatization or stemming, and stop words removal.
Clipboard Translator is a lightweight desktop application built with PyQt5 that automatically translates text copied to the clipboard into Persian using the Google Translate API. The application features a modern and minimalistic UI, custom styling, and real-time text normalization and tokenization.
Text classification is a widely used natural language processing task in different business problems. Given a statement or document, the task involves assigning to it an appropriate category from a pre-defined set of categories. The dataset of choice determines the set of categories. Text classification has applications in emotion classification, n
Pipeline Python pour enrichir un dataset Arabe (MSA) → Darija (MA) depuis livres PDF & transcriptions YouTube ; normalisation, segmentation par tokens, génération (OpenAI ou règles) et export JSON. Projet de stage d’application chez YaneCode Digital.
Comprehensive benchmark of OpenAI Whisper models for Bosnian, Croatian, and Serbian languages. Includes pipelines for audio transcription, rigorous text normalization, Levenshtein distance evaluation, and LLM-based post-processing.
Accurate categorization of eCommerce products improves user experience and boosts search engine visibility. The project goal is to classify products into 14 predefined categories using their descriptions sourced from an eCommerce platform.
This repository is dedicated to providing comprehensive resources and code snippets for text preprocessing and various NLP tasks. Whether you're a beginner or an experienced data scientist, you'll find useful tools and techniques here to enhance your natural language processing projects.