Build software better, together

joaquincanete / multimodal-data-pipeline-etl

🛠️ Build a multimodal ETL pipeline to automate the extraction, transformation, and loading of web content into structured storage for analysis.

python machine-learning automation ocr ai etl postgresql data-engineering minio batch-processing object-storage apache-airflow pymupdf multimodal-data workflow-orchestration lakehouse ai-agent pymupdf-fitz pdftoexcel

Updated Mar 24, 2026
CoffeeScript

Sanschinu95 / Maxcavator2.0

Star

Maxcavator 2.0 is an intelligent, AI-native PDF Data Extraction and Retrieval-Augmented Generation (RAG) system. It fundamentally changes how you understand and interact with your PDF documents by instantly extracting complex structures (sections, tables, images), generating robust RAG indices.

python ocr semantic-search document-analysis pdf-parser rag fastapi vector-search pdf-extraction sentence-transformers ai-chat pymupdf-fitz retrieval-augmented-generation llama3

Updated Mar 14, 2026
JavaScript

City-of-Memphis-Wastewater / pdflinkcheck

Star

Analyze all GoTo links, URI hyperlinks, URI file links, and TOC entries in a target PDF using a CLI and GUI wrapper for PyMuPDF. PyPI: https://pypi.org/project/pdflinkcheck Microsoft Store: https://apps.microsoft.com/detail/9n11hxvls1wg

pdf toc goto agplv3 pymupdf pymupdf-fitz pdf-links

Updated Mar 11, 2026
Python

Madhu-1106 / ResumeGenie

Star

ResumeGenie is an AI-based tool that analyzes resumes against job descriptions and provides fit scores, feedback, and skill improvement recommendations.

python fpdf streamlit pymupdf-fitz

Updated Mar 10, 2026
Python

ZobayerAkib / AI-Invoice-Analyzer

Star

An AI-powered invoice and receipt analyzer that extracts structured invoice data from images (JPG/PNG) and PDF documents using a Large Language Model (LLM).

pdf image fastapi pdf-text-extraction openai-api pymupdf-fitz llm invoice-analysis

Updated Mar 3, 2026
Python

Agrippa-Tech / FichamentoAutomatico-Python

Star

Fichamento Automático de PDFs: Aplicação Python para extração e formatação automática de trechos destacados em documentos PDF, seguindo padrões acadêmicos brasileiros de fichamento.

python pdf fichamento pymupdf-fitz

Updated Feb 12, 2026
Python

ramyadjoshi / IntelliDoc-AI-Powered-Intelligent-Document-Analysis-System

Star

IntelliDoc is an intelligent document understanding system that helps users extract, analyze, and query information from PDFs, scanned documents, images, and multilingual reports using OCR, AI, and Retrieval-Augmented Generation (RAG)

python opencv artificial-intelligence tesseract-ocr llama optical-character-recognition tf-idf-vectorizer streamlit pymupdf-fitz pillow-library faiss-vector-database groq-api rag-pipeline

Updated Feb 12, 2026
Python

asifnoushadsharafudeen / ai_conversational_banking_advisor

Star

Enterprise-grade AI banking chatbot built with FastAPI, Streamlit, OpenAI LLMs, pgvector RAG, secure PII masking, conversational memory, and real-time streaming.

machine-learning artificial-intelligence fastapi presidio deberta pymupdf-fitz langchain-python llamaindex rag-chatbot

Updated Jan 27, 2026
Python

armando-desouza / ai-pdf-summarizer-agent

Star

A robust Python-based ETL pipeline designed to ingest, rasterize, and extract structured data from complex PDF documents. Unlike standard text scrapers, this engine uses a "Vision-First" approach to handle layouts, charts, and non-selectable text, preparing assets for Multimodal AI analysis.

python automation ocr ai pymupdf ai-agent pymupdf-fitz pdftoexcel

Updated Jan 16, 2026
Python

HoustonAlexander / DATAFORGE

Star

Toolkit for research admin task at NCSU; Compatible with WRS reports

python regex pandas tkinter win32 pymupdf-fitz

Updated Jan 14, 2026
Python

HelmiDev03 / Tunisia-job-recommendation-system

Star

docker nextjs fastapi cloudrun sentence-transformers pymupdf-fitz all-minilm-l6-v2 llama3-1

Updated Dec 13, 2025
TypeScript

Harshitha8778 / semantic-search

Star

The purpose of this application is to facilitate efficient and intelligent searching of text content within PDF documents. By leveraging semantic search techniques, the application enhances the user's ability to locate information quickly and accurately within large documents.

python streamlit nltk-tokenizer sentence-transformers pymupdf-fitz qdrant-client

Updated Nov 27, 2025
Python

VaishnaviSh14 / MultiModel-RAG-With-Langchain

Star

A multimodal RAG system that extracts text + images from PDFs, generates CLIP embeddings, stores them in FAISS, and answers queries using LangChain and a local LLM.

numpy prompt python3 pytorch chunking huggingface-transformers pymupdf-fitz cliptext langchain-python vectorembeddings faiss-vector-database multimodal-rag

Updated Nov 26, 2025
Jupyter Notebook

saikaryekar / pdf-layout-mapper

Star

A Python tool for extracting text regions from PDF files, visualizing them as bounding boxes, and exporting structured data in JSON format.

cli shapely bbx pymupdf-fitz

Updated Nov 9, 2025
Python

gyan007 / AI-Tutor-Platform

Star

The AI Tutor Platform is an intelligent educational application built with FastAPI, Streamlit, LangChain, and Groq. It provides users with an AI-powered conversational tutor, auto-generated quizzes, and a file-based doubt solver. The platform includes user authentication and progress tracking, with all data persistently stored in a PostgreSQL DB.

postgresql pillow pandas render python3 bcrypt pgadmin4 altair pytesseract passlib pydantic fastapi groq streamlit psycopg2-binary pymupdf-fitz python-jose langchain

Updated Oct 13, 2025
Python

MelinaNorton / journal-vetter

Star

Python CLI & library for automated journal vetting — GPT‑4.1 summarization, YAML configuration, reproducible analysis.

python cli pypi openai text-summarization research-tool academic-journals document-embedding pymupdf document-embeddings pypi-package pdf-processing gpt-4 pymupdf-fitz llm llms chatgpt langchain langchain-python

Updated Jul 30, 2025
Python

RishavKumarSinha / adobe-hackathon-solution

Star

Solution for the Adobe India Hackathon 2025, Team - Codient (Team Leader - Gopal Ranjan, Team Members - Rishav Kumar Sinha)

python docker nlp-machine-learning containerization pymupdf-fitz

Updated Jul 27, 2025
Python

lazarokaua / Organiza-pasta-obsidian

Star

Organização de arquivos para meu Obsidian

python python-dotenv pymupdf-fitz google-gemini-api

Updated Jul 26, 2025
Python

Deepcoders30 / AI-CHATPDF

Star

ChatPDF is a web application that lets users upload PDFs and ask questions about their content.

javascript typescript reactjs fastapi pymupdf-fitz langchain faiss-vector-database groq-integration

Updated Jul 5, 2025
TypeScript

malavika-suresh / multiple_pdf_comparison

Star

This Python-based tool allows for efficient comparison of two or more PDF documents, highlighting the differences between them. It extracts and compares the words in the PDFs, ignoring whitespace differences, and highlights the changed, added, or missing words.

python pdf differences-detected difflib pdf-comparison text-comparison fitz comparison-tool multiple-pdfs pymupdf-fitz pdf-comparison-highlight-differences

Updated Jul 2, 2025
Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pymupdf-fitz

Here are 44 public repositories matching this topic...

joaquincanete / multimodal-data-pipeline-etl

Sanschinu95 / Maxcavator2.0

City-of-Memphis-Wastewater / pdflinkcheck

Madhu-1106 / ResumeGenie

ZobayerAkib / AI-Invoice-Analyzer

Agrippa-Tech / FichamentoAutomatico-Python

ramyadjoshi / IntelliDoc-AI-Powered-Intelligent-Document-Analysis-System

asifnoushadsharafudeen / ai_conversational_banking_advisor

armando-desouza / ai-pdf-summarizer-agent

HoustonAlexander / DATAFORGE

HelmiDev03 / Tunisia-job-recommendation-system

Harshitha8778 / semantic-search

VaishnaviSh14 / MultiModel-RAG-With-Langchain

saikaryekar / pdf-layout-mapper

gyan007 / AI-Tutor-Platform

MelinaNorton / journal-vetter

RishavKumarSinha / adobe-hackathon-solution

lazarokaua / Organiza-pasta-obsidian

Deepcoders30 / AI-CHATPDF

malavika-suresh / multiple_pdf_comparison

Improve this page

Add this topic to your repo