content-extraction

Here are 75 public repositories matching this topic...

PixelGrace / smart-article-extractor

Article extraction, content scraping

content-extraction article-scraping puppeteer-automation news-data-mining academic-article-scraper journalism-research-tool fake-news-monitoring web-content-downloader dynamic-page-scraping json-csv-exporter

Updated Nov 10, 2025
JavaScript

Tom0985 / metadata-scraper

Star

Automatically scrape metadata from websites

json-data web-scraping web-crawling content-extraction metadata-extraction website-analysis url-pattern-matching content-scraping seo-data pagination-handling

Updated Nov 10, 2025
Python

flaviodelgrosso / marky

Star

Marky helps you convert things into Markdown 📝

markdown cli parser mcp golang-library data-processing content-extraction document-loader

Updated Nov 10, 2025
Go

firecrawl / firecrawl-mcp-server

Star

🔥 Official Firecrawl MCP Server - Adds powerful web scraping and search to Cursor, Claude and any other LLM clients.

mcp web-crawler web-scraping data-collection batch-processing content-extraction search-api claude llm-tools firecrawl model-context-protocol mcp-server firecrawl-ai javascript-rendering

Updated Nov 6, 2025
JavaScript

mariafilimonova442 / Quora-Search-Bot

Star

Quora Search Bot – automated Android control

appium content-extraction android-automation mobile-bot wireless-automation ui-automator appilot adb-less-control quora-automation quora-search-bot

Updated Nov 4, 2025

prompt-stack / content-engine

Star

Extract content from reddit, tiktok, articles, youtube

python api scraping content-extraction llm

Updated Oct 31, 2025
Python

MacphersonDesigns / wp-rest-dumper

Star

Scraper for Wordpress Sites.

python wordpress scraper backup rest-api content-extraction

Updated Oct 31, 2025
Python

MichaelvanLaar / n8n-nodes-defuddle

Star

n8n community node for extracting main content from webpages using Defuddle library

web-scraping readability content-extraction n8n n8n-community-node-package defuddle

Updated Oct 29, 2025
TypeScript

manooll / webfetch-mcp

Star

Live Web Access for Your Local AI — Tunable Search & Clean Content Extraction

Updated Oct 27, 2025
JavaScript

chithraxx-0616 / AI_SUMMARIZER

Star

A Chrome extension that summarizes articles using Gemini API

javascript css chrome-extension html text-analysis browser-extension web-extension user-friendly content-extraction gemini-api reading-tools generative-ai google-ai-studio summarizer-api

Updated Oct 21, 2025
HTML

amirthfultehrani / Youtube-Transcript-Copier

Star

A userscript that adds a button to YouTube video pages for copying the transcript with or without timestamps.

Updated Oct 11, 2025
JavaScript

🚀 mcp-web-scrape — Clean, cache-aware web content fetcher for AI agents. Fetch any URL → extract readable content → return Markdown/JSON with citations. ⚡ Fast caching, 🤝 robots.txt compliant, 📝 Markdown-ready output, �� works with ChatGPT/Claude Desktop.

nodejs api agent markdown scraper typescript ai mcp cache web-crawler sse citations web-scraping stdio content-extraction claude llm chatgpt model-context-protocol

Updated Oct 11, 2025
TypeScript

legendmohe / wordpress-content-extractor

Star

🔄 Extract and convert WordPress export files to Markdown, CSV, and JSON formats with intelligent HTML parsing and code block detection

python markdown wordpress xml-parser content-extraction blog-migration

Updated Oct 10, 2025
Python

Noe-AC / url2tldr

Star

Lightweight Dash app to summarize YouTube & Reddit content with Ollama.

python nlp youtube reddit side-project tldr dash web-scraping webapp text-summarization summarization content-extraction llm ollama

Updated Oct 9, 2025
Python

vakharwalad23 / mark-minion

Sponsor

Star

The Ultimate Web Content Extraction & Conversion Tool for AI/LLM Applications. Convert almost any web content into clean Markdown with intelligent AI processing.

typescript web-scraping content-extraction document-processing tweets-extraction markdown-conversion puppeteer cloudflare-worker ai-powered

Updated Oct 8, 2025
TypeScript

pdfix / pdfix_sdk_example_cpp

Star

Make PDF Files Accessible, Extract Data from PDF, Convert PDF to HTML, Fill-in PDF Form, Stamp PDF and more...

Updated Oct 7, 2025
C++

graphlit / graphlit-mcp-server

Star

Model Context Protocol (MCP) Server for Graphlit Platform

web-crawler web-scraping data-collection content-extraction search-api claude unstructured-data content-ingestion llm-tools model-context-protocol mcp-server

Updated Oct 7, 2025
TypeScript

xsukax / xsukax-ReadClean-PDF

Star

A privacy-focused, client-side web application that extracts clean, readable content from any webpage and converts it to PDF format. Built with pure HTML, CSS, and JavaScript—no backend required, no tracking, complete privacy.

Updated Oct 5, 2025
HTML

danlikendy / articles_project

Star

Article parser for Habr, Proglib, and vc.ru that extracts main content, removes ads and unnecessary elements, preserving proper formatting

python markdown web-scraping beautifulsoup text-processing content-extraction article-parser cli-tool proglibio habr

Updated Oct 2, 2025
Python

oiwn / dom-content-extraction

Star

DOM Based Content Extraction via Text Density

scraping content-extraction dom-based

Updated Sep 23, 2025
Rust

Improve this page

Add a description, image, and links to the content-extraction topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the content-extraction topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

content-extraction

Here are 75 public repositories matching this topic...

PixelGrace / smart-article-extractor

Tom0985 / metadata-scraper

flaviodelgrosso / marky

firecrawl / firecrawl-mcp-server

mariafilimonova442 / Quora-Search-Bot

prompt-stack / content-engine

MacphersonDesigns / wp-rest-dumper

MichaelvanLaar / n8n-nodes-defuddle

manooll / webfetch-mcp

chithraxx-0616 / AI_SUMMARIZER

amirthfultehrani / Youtube-Transcript-Copier

mukul975 / mcp-web-scrape

legendmohe / wordpress-content-extractor

Noe-AC / url2tldr

vakharwalad23 / mark-minion

pdfix / pdfix_sdk_example_cpp

graphlit / graphlit-mcp-server

xsukax / xsukax-ReadClean-PDF

danlikendy / articles_project

oiwn / dom-content-extraction

Improve this page

Add this topic to your repo