-
Updated
Jan 21, 2024 - JavaScript
content-extraction
Here are 14 public repositories matching this topic...
Example project demonstrating how to use PDFix SDK WebAssembly build in Node.js. Make PDF Files Accessible, Extract Data from PDF, Convert PDF to HTML, Fill-in PDF Form, Stamp PDF and more...
-
Updated
Feb 20, 2025 - JavaScript
📋 WebMD is a Chrome extension that transforms web pages into Markdown documents with surgical precision.
-
Updated
Jul 3, 2025 - JavaScript
Document processing and querying system built with FastAPI and React. Upload documents and interact with their content using natural language queries powered by Gemini API and Unstructured.io
-
Updated
Nov 5, 2024 - JavaScript
Example project demonstrating how to use PDFix SDK WebAssembly build in Node.js. Make PDF Files Accessible, Extract Data from PDF, Convert PDF to HTML, Fill-in PDF Form, Stamp PDF and more...
-
Updated
Apr 4, 2023 - JavaScript
Simple node server to extract relevant content from website source code using Mozilla's Readability.js
-
Updated
Jan 3, 2021 - JavaScript
A web-based utility for fetching, categorizing, summarizing and managing global news and articles using the GDELT 2.0 API. Designed for content creators, news aggregators, and researchers, this tool simplifies access to up-to-date articles with an intuitive UI and customizable configurations.
-
Updated
Dec 5, 2024 - JavaScript
Chrome extension to copy YouTube transcripts with AI-friendly features
-
Updated
Aug 6, 2025 - JavaScript
Live Web Access for Your Local AI — Tunable Search & Clean Content Extraction
-
Updated
Oct 27, 2025 - JavaScript
A userscript that adds a button to YouTube video pages for copying the transcript with or without timestamps.
-
Updated
Oct 11, 2025 - JavaScript
Prysm is a blazing-smart Puppeteer-based web scraper that doesn't just extract - it understands structure. Capable of scraping virtually any website with intelligent content detection and 14 specialized scroll strategies that adapt to different page layouts, Prysm excels at extracting content that other scrapers miss.
-
Updated
Apr 4, 2025 - JavaScript
A powerful MCP server extension providing web search and content extraction capabilities. Integrates DuckDuckGo search functionality and URL content extraction into your MCP environment, enabling AI assistants to search the web and extract webpage content programmatically.
-
Updated
Sep 20, 2025 - JavaScript
🔍 Model Context Protocol (MCP) tool for parsing websites using the Jina.ai Reader
-
Updated
Apr 5, 2025 - JavaScript
🔥 Official Firecrawl MCP Server - Adds powerful web scraping and search to Cursor, Claude and any other LLM clients.
-
Updated
Nov 6, 2025 - JavaScript
Improve this page
Add a description, image, and links to the content-extraction topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the content-extraction topic, visit your repo's landing page and select "manage topics."