Processing and hashing Slack communication to enable language modelling
-
Updated
Jun 7, 2023 - Python
Processing and hashing Slack communication to enable language modelling
A local GPU-accelerated Retrieval-Augmented Generation (RAG) pipeline for PDF question-answering with multi-LLM support and modular NLP components. Process documents locally with privacy-focused information retrieval.
A fast and easy-to-use Python toolkit for image processing with CLI tools for resizing, cropping, OCR, and optimization, including batch processing support.
PDF Liberation MCP Server - Break large PDFs into digestible chunks for Claude
This web application utilizes OCR technology to recognize text in uploaded images and provides spelling correction and word performance improvement. Users can easily upload images containing text and receive accurate and enhanced text results.
Web Application to extract text from image
Python tool for converting PDF files to text. Simplify your document processing tasks.
Extract price amount and currency symbol from a raw text string
OCR tool to extract and structure text from images and scanned PDFs (outputs .docx / .txt) — FR/EN
Retrieve data from two different websites, loading them into the PostgreSQL database using Python, and combine them to get and present new information
A Python-based application for live video text extraction using the Gemini 1.5 Flash API, hand gesture detection, and UI display.
Convert scrolling article videos into long images and extract text with OCR.
A complete Python pipeline that automates the creation of structured datasets from natural language search queries. This tool searches the web for content matching your query, scrapes and cleans the content, and outputs a structured dataset in multiple formats.
fast python port of arc90's readability tool, updated to match latest readability.js!
A Flask-based web app integrated with Twilio that automatically receives resumes via WhatsApp, extracts candidate details (name, email, phone), and stores them in Google Sheets and Drive using NLP and regex-based text extraction.
DeepDive AI - PaperInsight is an innovative tool that enables users to upload AI research papers in PDF format, ask questions, and receive context-aware insights. Streamline your research process and unlock valuable information effortlessly!
LlamaIndex-powered web content extractor for RAG applications
AI-powered document analysis service combining AWS Textract, Bedrock, and intelligent blur detection. Supports CLI and serverless Lambda API for Malaysian documents (licenses, receipts, ID cards, utility bills).
A simple python script that fetches data from the typeform API.
Add a description, image, and links to the text-extraction topic page so that developers can more easily learn about it.
To associate your repository with the text-extraction topic, visit your repo's landing page and select "manage topics."