-
spider
A web crawler and scraper, building blocks for data curation workloads
-
spider_chromiumoxide_cdp
Contains all the generated types for chromiumoxide
-
spider_transformations
Transformation utils to use for spider
-
firecrawl
Official Rust SDK for Firecrawl API v2
-
git-leave
Check for unsaved or uncommitted changes on your machine
-
crw-pdf
Fast PDF inspection, classification, and text extraction — vendored from firecrawl/pdf-inspector
-
crw-mcp
MCP (Model Context Protocol) server for the CRW web scraper
-
crw-cli
crw — CLI tool for scraping URLs to markdown/JSON without a server
-
kreuzcrawl
High-performance web crawling engine
-
arkenar
CLI frontend for the Arkenar vulnerability scanner
-
mq-crawler
Directory crawler for batch Markdown file processing
-
crawlex
Stealth crawler with Chrome-perfect TLS/H2 fingerprint, render pool, hooks, persistent queue
-
spider-client
Spider Cloud client
-
crw-mcp-proto
MCP JSON-RPC 2.0 protocol types (shared by crw-server, crw-mcp, crw-browse)
-
siteone-crawler
Website crawler and QA toolkit in Rust for security, performance, SEO, and accessibility audits, offline cloning, markdown export, sitemap generation, cache warming, and CI/CD gating…
-
crw-renderer
HTTP and CDP browser rendering engine for the CRW web scraper
-
crw-search
SearXNG-backed search client and result transforms for the CRW web scraper
-
spider-cloud-cli
The Spider Cloud CLI for web crawling and scraping
-
iocaine
The deadliest poison known to AI
-
gnostr_filehash_core
gnostr filehash core
-
crw-server
Firecrawl-compatible API server for the CRW web scraper
-
spider-lib
A Rust-based web scraping framework inspired by Scrapy (Python)
-
argus-crawler
A production-ready web crawler capable of handling billions of URLs
-
fav_core
Fav's core crate; A collection of traits
-
kumo
An async web crawling framework for Rust - Scrapy for Rust
-
crw-core
Core types, config, and error handling for the CRW web scraper
-
siteprobe
CLI tool to fetch URLs from sitemap.xml, check their existence, and generate performance reports
-
spider-core
Core functionality for the spider-lib web scraping framework
-
mocra
A distributed, event-driven crawling and data collection framework
-
secret_scraper
A URL Crawler tool and library for crawling web targets, discovering links, and detecting secrets with configurable regex rules
-
spider_worker
The fastest web crawler as a worker or proxy
-
kreuzcrawl-cli
Command-line web crawler and scraper
-
crw-crawl
Async BFS web crawler with rate limiting and robots.txt support for CRW
-
silkworm-rs
Async-first web scraping framework (Rust port)
-
socket-patch-core
Core library for socket-patch: manifest, hash, crawlers, patch engine, API client
-
robotstxt
A native Rust port of Google's robots.txt parser and matcher C++ library
-
wdict
Create dictionaries by scraping webpages or crawling local files
-
crw-extract
HTML extraction and markdown conversion engine for the CRW web scraper
-
yps
Yggdrasil Port Scanner
-
iscrawl
Fast crawler/bot detection from User-Agent strings
-
firecrawl-sdk
Rust SDK for Firecrawl API
-
website_crawler
gRPC tokio based web crawler built with spider
-
spider_chromiumoxide_fetcher
Contains a chromium and chrome for testing fetcher
-
zapfetch
Official Rust SDK for the ZapFetch v2 web scraping API
-
toai
path crawler, that copies all SRC files into a singe output to send it to a ai (toai)
-
dig2crawl
Universal agnostic web crawler with Claude-powered CSS selector discovery
-
crawn
web crawling and scraping
-
product-os-crawler
Product OS : Crawler is a browser based cralwer that utilises Product OS : Browser to perform advanced url crawling leveraging headless browsing and automation
-
argus-common
Common types and utilities for the Argus web crawler
-
tarzi
Rust-native lite search for AI applications
-
yagami
A concurrent sitemap crawler that validates web links and exports results to CSV
-
spider_cli
The fastest web crawler CLI written in Rust
-
halo-spider
A fast, async web crawling framework inspired by Scrapy, built in Rust
-
spider_scraper
A css scraper using html5ever
-
argus-storage
Storage backends for crawled web data
-
spider_utils
Spider Web Crawler
-
hnu_query
querying systems of Hunan University
-
rust_scraper
Production-ready web scraper with Clean Architecture, TUI selector, and sitemap support
-
hazler-js-parser
JavaScript endpoint parser for Hazler
-
pulsarss
RSS Aggregator for Gemini Protocol
-
spider_remote_cache
Shared remote cache upload worker for spider and chromey
-
json-crawler
Wrapper for serde_json that provides nicer errors when crawling through large json files
-
spider-pipeline
Pipeline implementations for the spider-lib web scraping framework
-
actix_block_ai_crawling
Actix Middleware that blocks Generative AI crawlers
-
doclinks
Discover and export documentation links from docs sites
-
spider-util
Shared utility functions and types for the spider-lib ecosystem
-
netiquette
A rate-limiter for politely crawling the Web
-
robotxt
Robots.txt (or URL exclusion) protocol with the support of crawl-delay, sitemap and universal match extensions
-
wrake
Collect links from the given URL
-
argus-config
Configuration management for the Argus web crawler
-
scrapfly-sdk
Async Rust client for the Scrapfly web scraping, screenshot, extraction and crawler APIs
-
pixivdwn
Incremental pixiv crawler/downloader
-
anakin-sdk
Official Rust SDK for the Anakin web-scraping API. Scrape, crawl, search, and run Wire actions with internal job polling.
-
halldyll-core
Core scraping engine for Halldyll - high-performance async web scraper for AI agents
-
spider-downloader
Downloader component for the spider-lib web scraping framework
-
hazler-http
HTTP client utilities for the hazler web crawling framework
-
gnostr-web
gnostr web and JS bindings crate
-
seaward
grep-like tool for the web
-
argus-parser
HTML and sitemap parsing utilities for web crawling
-
pkuinfo-spider
基于微信公众号后台超链接查找的文章爬虫 CLI
-
crawly
A lightweight async Web crawler in Rust, optimized for concurrent scraping while respecting
robots.txtrules -
spider-middleware
Middleware implementations for the spider-lib web scraping framework
-
qrawl
Rust toolkit to crawl web data for AI agents
-
scrapling-spider
Concurrent web crawler framework for scrapling
-
headless_chrome_fork
Control Chrome programatically
-
inverta
A basic search engine that downloads pages and matches your search query against their contents
-
voyager
Web crawler and scraper
-
capp
Common things i use to build Rust CLI tools for web crawlers
-
ferrisfetcher
A cutting-edge, high-level web scraping library crafted in Rust
-
crawlurls
A fast async Rust crawler that discovers and filters URLs by pattern without scraping content
-
ares-core
Core types, traits, and error handling for Ares AI scraper
-
scoutlang
A web crawling programming language
-
omnivore-cli
Universal web scraper and code extractor CLI - crawl websites, analyze repositories, build knowledge graphs
-
minisearchtk
Small toolkit for crawling and searching web pages
-
iocaine-label
The deadliest poison known to AI
-
shader-prepper
Shader include parser and crawler
-
iocaine-table
The deadliest poison known to AI
-
mitsuba
Lightweight 4chan board archive software (like Foolfuuka), in Rust
-
hazler-parser
HTML parser for extracting links and endpoints for the Hazler web crawler
-
coma
lightweight command-line tool designed for crawling websites
-
recursive_scraper
Constant-frequency recursive CLI web scraper with frequency, filtering, file directory, and many other options for scraping HTML, images and other files
-
unobtanium-crawler
The default web-crawler for unobtanium
-
turboscraper
A high-performance, concurrent web scraping framework for Rust with built-in support for retries, storage backends, and concurrent request handling
-
unobtanium
Opinioated Web search engine library with crawler and viewer companion
-
spidermedic
A CLI tool for validating pages while crawling a website
-
scout-parser
A web crawling programming language
-
hashtree-nostr-bridge
Relay bridge and social-graph-scoped crawler for hashtree Nostr indexes
-
scout-lexer
A web crawling programming language
-
chan-downloader
CLI to download all images/webms of a 4chan thread
-
firecrawl_rs
Rust SDK for Firecrawl API
-
indexea
OpenAPI of Indexea
-
website_crawler_sdk
The official Rust SDK for the Website Crawler API
-
frangipani
Scraping framework for rust
-
async_job
async cron job crate for Rust
-
aquatic-crawler
Crawler tool for the Aquatic BitTorrent tracker API
-
quick_crawler
QuickCrawler is a Rust crate that provides a completely async, declarative web crawler with domain-specific request rate-limiting built-in
-
brchd
Data exfiltration toolkit
-
spacebar
An anti-plagiarism tool based on null width characters
-
vil_crawler
VIL Web Crawler — async concurrent BFS crawling with robots.txt support (I01)
-
maman
Rust Web Crawler
-
rsfile
operate files or web pages easily and quickly
-
ntwrk
TODO
-
arkenar-core
Core library for the Arkenar vulnerability scanner
-
jsdom
javascript dom parser for web scraping
-
finde-rs
Multi-threaded filesystem crawler
-
ptt-crawler
A crawler for the web version of PTT, the largest online community in Taiwan
-
s5_importer_http
HTTP importer for S5
-
spidery
Rust SDK for Spidery API
-
surly-spider
A command line interface for crawling websites
-
web-crawler
Finds every page, image, and script on a website (and downloads it)
-
url-crawl
URL crawler for HTML code
-
krate
Get information and metadata for published Rust crates
-
stream_crawler
scraping web pages and extracting URLs and endpoints
-
omnivore-core
Core crawler and knowledge graph engine for Omnivore - web scraping, AI extraction, browser automation
-
actix-prerender
Actix middleware that sends requests to Prerender.io or a custom Prerender service URL
-
rust-rock-rover
Concert web crawler in Rust
-
od-get
recursively crawling & downloading data from open directories
-
kodict
Korean Dictionary Implements and Crawler for Rust
-
doublesite
Alternative for httrack
-
crawl
Rust crawl
-
wls
Easily crawl multiple sitemaps and list URLs
-
lolchive
local liminal archiver for webpages
-
source-demo-tool-crawler
WIP: a gui tool for opening (editing planned) source engine demo files
-
ssufid
SSU Announcement Crawler for Everyone
-
spire
The flexible scraper framework powered by tokio and tower
-
emails
A web scraper to extract email addresses from websites
-
sws-crawler
Web crawler with plugable scraping logic
-
scraper_query
Ergonomic Query for HTML with Scraper
-
flatcrawl-crawler
set of webpage crawlers. New crawlers can be easily configured and the output can be written to an AMQP queue.
-
task_deport
Organize simple task queue
-
gar-crawl
High level HTML crawler with concise builder
-
ac_crawler_types
normalized types for the anti capital public data crawlers
Try searching with DuckDuckGo.