content-extraction

Star

Here are 8 public repositories matching this topic...

MichaelvanLaar / n8n-nodes-defuddle

Star

n8n community node for extracting main content from webpages using Defuddle library

web-scraping readability content-extraction n8n n8n-community-node-package defuddle

Updated Oct 29, 2025
TypeScript

cyanheads / jinaai-mcp-server

Sponsor

Star

A Model Context Protocol (MCP) server that provides intelligent web reading capabilities using the Jina AI Reader API. It extracts clean, LLM-ready content from any URL.

agent mcp web-scraping content-extraction jina llm jinaai mcp-server modelcontextprotocol

Updated Sep 4, 2025
TypeScript

🚀 mcp-web-scrape — Clean, cache-aware web content fetcher for AI agents. Fetch any URL → extract readable content → return Markdown/JSON with citations. ⚡ Fast caching, 🤝 robots.txt compliant, 📝 Markdown-ready output, �� works with ChatGPT/Claude Desktop.

nodejs api agent markdown scraper typescript ai mcp cache web-crawler sse citations web-scraping stdio content-extraction claude llm chatgpt model-context-protocol

Updated Oct 11, 2025
TypeScript

vakharwalad23 / mark-minion

Sponsor

Star

The Ultimate Web Content Extraction & Conversion Tool for AI/LLM Applications. Convert almost any web content into clean Markdown with intelligent AI processing.

typescript web-scraping content-extraction document-processing tweets-extraction markdown-conversion puppeteer cloudflare-worker ai-powered

Updated Oct 8, 2025
TypeScript

kamjin3086 / Crawell

Star

📸 Crawell – 网页图片/正文一键提取、Markdown 转换与批量下载的浏览器扩展，本地化，免费 Crawell browser extension for one-click image & article extraction, Markdown conversion and bulk download – 100 % local processing.

react chrome-extension markdown typescript firefox-addon web-scraping browser-extension edge-extension content-extraction image-downloader tailwindcss privacy-first

Updated Jul 31, 2025
TypeScript

tuffstuff9 / nextjs-pdf-parser

Star

Next.js template for seamless PDF parsing using pdf2json and FilePond. Ideal for developers seeking a ready-to-use solution for PDF content extraction in Next.js projects.