Build software better, together

amienbou121 / crawl4ai-mcp-server

🕷️ Enable AI agents to scrape and crawl the web effortlessly with this lightweight Model Context Protocol server, integrating seamlessly into your workflows.

mcp crawling web-scraping web-crawling semantic-search ai-agents reranking keyword-search uvx agentic-workflow agentic-rag agentic-ai crawl4ai pydantic-ai claude-code model-context-protocol-servers openai-agents-sdk firecrawl-alternative

Updated Mar 28, 2026
Python

BlackSheep0110 / OmniCrawler

Star

🕷️ Automate web scraping with OmniCrawler, a powerful tool that builds large datasets by discovering and downloading relevant content effortlessly.

python crawler data-mining multithreading web-scraper text-extraction dataset-creation beautifulsoup web-crawling web-automation nlp-datasets sitemap-parser osint-tools

Updated Mar 28, 2026
Python

shirssss / crawlee-web-scraping

Star

🌐 Use Crawlee to streamline web scraping with Node.js, featuring sessions management, proxy rotation, and dynamic content handling for efficient data extraction.

crawler tutorial typescript web headless scraping web-scraper session-management korean crawlers deutsch web-crawling hacktoberfest browser-automation headless-chrome apify dynamic-content ko proxy-rotation

Updated Mar 28, 2026

amineeng / scraping-browser

Star

🔍 Automate dynamic web scraping with Scraping Browser, a full-host solution using Puppeteer, Selenium, and Playwright for seamless data collection.

javascript python bot crawler automation web-crawler web-scraper fingerprinting web-crawling user-agent-string web-search web-agent abck llm-agent aws-waf-token ollama langgraph

Updated Mar 28, 2026

fin3371 / amazon-product-scraper

Star

🛒 Extract and analyze Amazon product data effortlessly with this lightweight Python scraper, ideal for price tracking and competitor research.

python scraper amazon-api selenium web-scraping url-scraper web-crawling price-scraper price-scraping scraping-web scraping-data scrape-products amazon-scraping amazon-asin amazon-products-api amazon-aip

Updated Mar 28, 2026
Python

yunitaanggraini / deutsches-krankenhaus-verzeichnis-hospitals-scraper

Star

🏥 Scrape detailed hospital information from the Deutsches Krankenhaus Verzeichnis website, gathering structured data like contact details and addresses efficiently.

python selenium web-scraping data-extraction bs4 web-crawling hospital-data address-scraping structured-dataset paginated-scraping advanced-search-scraping contact-information-scraper

Updated Mar 28, 2026

erickzinsz / Vncz-Test-Actor-Scraper

Star

🕷️ Build efficient web crawlers with Vncz-Test-Actor-Scraper, a TypeScript template using Puppeteer for JavaScript-heavy pages and structured data storage.

boilerplate scraper typescript actor web-crawling headless-chrome puppeteer parallel-scraping vncz

Updated Mar 28, 2026

Awurama-22 / instagram-reels-scraper

Star

📊 Extract Instagram Reels and insights efficiently with this high-performance scraper, turning public data into structured datasets for analysis and marketing.

javascript python scraper automation cross-platform scraping selenium chromedriver scraping-websites web-crawling automatic-login instagram-data auto-poster scraping-web requests-python requests-library-python reels instagram-reels-downloader

Updated Mar 28, 2026

ANURAG1-DEV-HASH / state-bar-websites-email-scraper

Star

📧 Scrape names and emails from US state bar websites to enhance legal networking and research with accurate, public contact data.

python data-mining scraper email state web-scraper websites bar scrapy web-crawling data-scraping public-records us-legal-data

Updated Mar 28, 2026

oebeledrijfhout / attorney-directory-scraper

Star

🕵️♂️ Scrape attorney data from major U.S. directories while ensuring compliance, covering 20 cities and five practice areas efficiently.

data-mining web-scraping scrapy web-crawling data-scraper avvo legal-professionals attorney-directory-scraper public-directories findlaw super-lawyers legal-data-scraper practice-areas-scraping

Updated Mar 28, 2026

minhaz1o84 / amazon-web-scraper

Star

🛒 Scrape Amazon product data efficiently with AI and Playwright, designed for developers and data analysts seeking structured information.

python webdriver amazon selenium web-scraping concurrent-programming scrapers amazon-web-services amazon-api-gateway webscraping web-crawling amazon-lambda beautifulsoup4 web-scraping-tutorials amazon-comprehend amazon-reviews amazon-scraper concurrent-scraping

Updated Mar 28, 2026

StudyTab / Phantom-Crawler

Star

🕵️♂️ Perform robust web security scanning and reconnaissance with PhantomCrawler, designed for researchers and pen testers to enhance application security.

python scraper phantomjs web-crawler crawling web-security proxy-configuration axe web-crawling hacktoberfest security-scanner graphql-security apify ddos-attack-tools proxy-rotation hacktoberfest-accepted website-hits jwt-analysis

Updated Mar 28, 2026
TypeScript

Naxh156 / claude-agent-sdk-python

Star

🤖 Build and interact with Claude Agent using this Python SDK for seamless integration and efficient asynchronous querying.

python agent json information-retrieval streaming mcp developer-tools agents web-crawling rag exa-api agentic-workflow agentic-ai fastmcp research-agents claude-code exa-research exa-code

Updated Mar 28, 2026
Python

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

nodejs javascript npm crawler scraper automation typescript web-crawler headless scraping crawling web-scraping web-crawling headless-chrome apify puppeteer playwright

Updated Mar 28, 2026
TypeScript

apify / crawlee-python

Star

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

python crawler scraper automation web-crawler headless scraping crawling pip web-scraping beautifulsoup web-crawling hacktoberfest headless-chrome apify parsel playwright

Updated Mar 28, 2026
Python

olostep-api / CLI

Star

CLI for Olostep - The fastest way to get clean web data into your AI workflows. Search, scrape, and crawl the web from your terminal with Olostep — no headless browsers, no anti-bot headaches, no infra.

cli web ai web-scraping agents web-crawling web-search agent-skill

Updated Mar 28, 2026
Python

scrapeway / best-web-scraping-api-benchmarks

Star

what is the best web scraping API service? Research through benchmarks

data-science data-mining web-scraping web-crawling

Updated Mar 28, 2026

liebenholz / ohaasa-daily-discord

Star

오하아사 순위를 아침마다 알려주는 디스코드 봇

python webhooks automation discord-bot web-crawling

Updated Mar 27, 2026
Python

spider-rs / n8n-nodes-spider

Star

Spider n8n community node — crawl, scrape, and extract structured data from any website inside your n8n workflows.

automation ai spider web-scraping data-extraction web-crawling workflow-automation n8n n8n-node n8n-community-node

Updated Mar 27, 2026
TypeScript

hutchpd / ProductNormaliser

Star

Open product-intelligence engine that turns messy retail and manufacturer page data into clean, canonical, comparable product records.

mongodb dotnet web-crawler entity-resolution data-extraction web-crawling product-catalog retail-data data-reconciliation product-intelligence product-normalization canonical-data

Updated Mar 27, 2026
C#

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

web-crawling

Here are 422 public repositories matching this topic...

amienbou121 / crawl4ai-mcp-server

BlackSheep0110 / OmniCrawler

shirssss / crawlee-web-scraping

amineeng / scraping-browser

fin3371 / amazon-product-scraper

yunitaanggraini / deutsches-krankenhaus-verzeichnis-hospitals-scraper

erickzinsz / Vncz-Test-Actor-Scraper

Awurama-22 / instagram-reels-scraper

ANURAG1-DEV-HASH / state-bar-websites-email-scraper

oebeledrijfhout / attorney-directory-scraper

minhaz1o84 / amazon-web-scraper

StudyTab / Phantom-Crawler

Naxh156 / claude-agent-sdk-python

apify / crawlee

apify / crawlee-python

olostep-api / CLI

scrapeway / best-web-scraping-api-benchmarks

liebenholz / ohaasa-daily-discord

spider-rs / n8n-nodes-spider

hutchpd / ProductNormaliser

Improve this page

Add this topic to your repo