#

web-scraping

Here are 289 public repositories matching this topic...

firecrawl

firecrawl / firecrawl

🔥 The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data

markdown crawler scraper ai html-to-markdown web-crawler scraping web-scraper web-scraping data-extraction webscraping web-data-extraction ai-agents web-search ai-search web-data llm ai-crawler ai-scraping

Updated Nov 7, 2025
TypeScript

crawlee

apify / crawlee

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

nodejs javascript npm crawler scraper automation typescript web-crawler headless scraping crawling web-scraping web-crawling headless-chrome apify puppeteer playwright

Updated Nov 10, 2025
TypeScript

getmaxun / maxun

⚡ Easiest no code web data extraction platform • Instantly turn any website into API or spreadsheet ⚡

Updated Nov 10, 2025
TypeScript

jaypyles / Scraperr

Self-hosted webscraper.

python docker kubernetes opensource helm scraping webscraper web-scraper self-hosted web-scraping web-scrapers webscraping playwright

Updated Oct 12, 2025
TypeScript

intoli / user-agents

A JavaScript library for generating random user agents with data that's updated daily.

javascript user-agent random randomization navigator web-scraping browsers browser-automation user-agent-spoofer

Updated Oct 14, 2025
TypeScript

web-agent-master / google-search

A Playwright-based Node.js tool that bypasses search engine anti-scraping mechanisms to execute Google searches. Local alternative to SERP APIs with MCP server integration.

ai web-scraping google-search llm mcp-server

Updated Apr 6, 2025
TypeScript

graphlit / graphlit-mcp-server

Model Context Protocol (MCP) Server for Graphlit Platform

web-crawler web-scraping data-collection content-extraction search-api claude unstructured-data content-ingestion llm-tools model-context-protocol mcp-server

Updated Oct 7, 2025
TypeScript

ayakashi

ayakashi-io / ayakashi

⚡ Ayakashi.io - The next generation web scraping framework

data-mining automation web-scraping web-crawling headless-chrome

Updated Jun 29, 2023
TypeScript

minhlucvan / n8n-nodes-browserless

n8n node to interact with browserless instance

web-scraping browser-automation browserless n8n n8n-nodes n8n-community-node-package

Updated Oct 3, 2024
TypeScript

bitmakerla / estela

estela, an elastic web scraping cluster 🕸

react python docker kubernetes scraper django scraping crawling requests web-scraping scrapy hacktoberfest python-requests scrapyd scrapy-visualization webscraping-python

Updated Nov 6, 2025
TypeScript

mtwn105 / decipher-research-agent

Turn topics, links, and files into AI-generated research notebooks — summarize, explore, and ask anything.

agent ai mcp scraping ml artificial-intelligence gemini web-scraping openai qdrant llm vector-db crewai notebooklm agentic-ai model-context-protocol

Updated Jun 6, 2025
TypeScript

discohook / discohook

Send webhook messages from your browser (and so much more)

webhooks typescript discord discordjs embeds postgresql discord-api web-scraping cloudflare-workers discord-webhook cloudflare-kv durable-objects drizzle-orm

Updated Nov 9, 2025
TypeScript

apify / browser-pool

A Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright, or SecretAgent.

scraping web-scraping browser-automation headless-browsers rpa puppeteer playwright

Updated Nov 14, 2022
TypeScript

kamolmachine / social-media-downloader-bot

📥 Bot for downloading any media from Instagram, Twitter and videos from TikTok and Youtube

social-media telegram-bot telegraf web-scraping youtube-downloader twitter-downloader instagram-downloader tiktok-downloader

Updated Aug 26, 2024
TypeScript

3choff / docs-miner

A VSCode extension that generates markdown documentation from web pages and GitHub repositories.

documentation-tool web-scraping developer-tools vscode-extension documentation-generator markdown-generator github-to-markdown website-to-markdown

Updated Jan 28, 2025
TypeScript

armand1m / papercut

Papercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.

nodejs crawler scraper typescript cache scraping web-scraping jsdom

Updated Jan 8, 2023
TypeScript

huynlx / Extensions-Viet

Paperback - Vietnamese - Extensions

typescript crypto cheerio manga comics web-scraping horror hentai hacktoberfest paperback manhwa manhua paperback-source paperback-repo

Updated Oct 5, 2024
TypeScript

MBach / LeMondeRssReader

📰 Read RSS feed from LeMonde.fr and display news inside the App

react-native material-design rss-reader web-scraping react-native-paper

Updated Nov 6, 2025
TypeScript

Eltik / AniSync

Mapping sites to AniList and back.

javascript library typescript web anime mapping scraping manga anilist web-scraping scraping-websites

Updated Apr 16, 2024
TypeScript

apify / super-scraper

Generic REST API for scraping websites. Drop-in replacement for ScrapingBee, ScrapingAnt, and ScraperAPI services. And it is open-source!

nodejs javascript api typescript cheerio scraping web-scraping apify playwright

Updated Oct 30, 2025
TypeScript

Improve this page

Add a description, image, and links to the web-scraping topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the web-scraping topic, visit your repo's landing page and select "manage topics."