#web-crawler

  1. spider

    A web crawler and scraper, building blocks for data curation workloads

    v2.51.190 15K #web-scraping #web-crawler
  2. spider_chromiumoxide_cdp

    Contains all the generated types for chromiumoxide

    v0.7.9 22K #chromiumoxide #cdp #generated #dev-tools #protocols #chrome #web-crawler
  3. spider_transformations

    Transformation utils to use for spider

    v2.39.13 2.3K #web-crawler #transformation #crawler
  4. firecrawl

    Official Rust SDK for Firecrawl API v2

    v2.5.0 370 #web-crawler #api
  5. git-leave

    Check for unsaved or uncommitted changes on your machine

    v1.6.4 1.0K #web-crawler #uncommitted #cli
  6. crw-pdf

    Fast PDF inspection, classification, and text extraction — vendored from firecrawl/pdf-inspector

    v0.3.5 #self-hosting #pdf #web-scraping #artificial-intelligence #ai-api #mcp-server #markdown #firecrawl #docker #web-crawler
  7. crw-mcp

    MCP (Model Context Protocol) server for the CRW web scraper

    v0.8.3 #web-crawler #mcp #firecrawl #llm #web-scraping
  8. crw-cli

    crw — CLI tool for scraping URLs to markdown/JSON without a server

    v0.7.1 #web-crawler #web-scraping #firecrawl #llm #mcp
  9. kreuzcrawl

    High-performance web crawling engine

    v0.3.0-rc.19 #web-scraping #web-crawler #rust
  10. arkenar

    CLI frontend for the Arkenar vulnerability scanner

    v1.1.4 #scan #cli #scanner #vulnerabilities #banner #web-crawler #nuclei #katana #recon #authentication
  11. mq-crawler

    Directory crawler for batch Markdown file processing

    v0.5.29 #web-crawler #query #markdown #jq
  12. crawlex

    Stealth crawler with Chrome-perfect TLS/H2 fingerprint, render pool, hooks, persistent queue

    v1.0.4 #headless-chrome #stealth #web-crawler #fingerprint
  13. spider-client

    Spider Cloud client

    v0.1.87 #artificial-intelligence #web-crawler #web-indexer
  14. crw-mcp-proto

    MCP JSON-RPC 2.0 protocol types (shared by crw-server, crw-mcp, crw-browse)

    v0.8.3 #web-crawler #firecrawl #mcp #web-scraping #llm
  15. siteone-crawler

    Website crawler and QA toolkit in Rust for security, performance, SEO, and accessibility audits, offline cloning, markdown export, sitemap generation, cache warming, and CI/CD gating…

    v2.3.0 #web-crawler #seo #website-analysis #accessibility #security
  16. crw-renderer

    HTTP and CDP browser rendering engine for the CRW web scraper

    v0.8.3 #web-crawler #firecrawl #web-scraping #llm #mcp
  17. crw-search

    SearXNG-backed search client and result transforms for the CRW web scraper

    v0.8.3 #web-crawler #web-scraping #firecrawl #mcp #llm
  18. spider-cloud-cli

    The Spider Cloud CLI for web crawling and scraping

    v0.1.87 #web-scraping #web-indexer #web-crawler
  19. iocaine

    The deadliest poison known to AI

    v3.4.0 #poison #artificial-intelligence #deadliest #content #request #web-scraping #reverse-proxy #web-crawler #scripting-engine #maze
  20. gnostr_filehash_core

    gnostr filehash core

    v949455.1873.486970 #nostr #hashing #gnostr-a #filehash #workflow #git-nostr #bencode #web-crawler #bitcoin #async-git
  21. crw-server

    Firecrawl-compatible API server for the CRW web scraper

    v0.8.3 #web-crawler #mcp #firecrawl #llm #web-scraping
  22. spider-lib

    A Rust-based web scraping framework inspired by Scrapy (Python)

    v3.0.4 #web-scraping #async #web-crawler #scraper
  23. argus-crawler

    A production-ready web crawler capable of handling billions of URLs

    v0.1.0 #web-crawler #web-scraping #web #async
  24. fav_core

    Fav's core crate; A collection of traits

    v0.1.8 2.0K #resources #status-flags #web-crawler #fetch #protobuf #cookies #collection-traits #logout #visualize #fetched
  25. kumo

    An async web crawling framework for Rust - Scrapy for Rust

    v0.2.10 #web-scraping #rate-limiting #web-crawler #async
  26. crw-core

    Core types, config, and error handling for the CRW web scraper

    v0.8.3 #web-crawler #firecrawl #web-scraping #mcp #llm
  27. siteprobe

    CLI tool to fetch URLs from sitemap.xml, check their existence, and generate performance reports

    v1.3.0 #sitemap #web-crawler #performance #url-checker #http-monitoring
  28. spider-core

    Core functionality for the spider-lib web scraping framework

    v2.0.4 #spider-lib #web-scraping #async #web-crawler #rust
  29. mocra

    A distributed, event-driven crawling and data collection framework

    v0.2.16 #web-crawler #framework #distributed #queue #distributed-queue
  30. secret_scraper

    A URL Crawler tool and library for crawling web targets, discovering links, and detecting secrets with configurable regex rules

    v0.1.1 #web-crawler #secret #security #secret-scanner #scanner
  31. spider_worker

    The fastest web crawler as a worker or proxy

    v2.51.190 #web-crawler #spider-cli #crawler
  32. kreuzcrawl-cli

    Command-line web crawler and scraper

    v0.3.0-rc.19 #web-crawler #web-scraping #engine #java #engine-bindings #mcp #ruby #elixir #wasm #php
  33. crw-crawl

    Async BFS web crawler with rate limiting and robots.txt support for CRW

    v0.8.3 #web-crawler #firecrawl #llm #mcp #web-scraping
  34. silkworm-rs

    Async-first web scraping framework (Rust port)

    v0.1.2 #web-scraping #web-framework #web-crawler #async
  35. socket-patch-core

    Core library for socket-patch: manifest, hash, crawlers, patch engine, API client

    v2.1.4 #web-crawler #sha-256 #purl #manifest-json #package #composer #maven #nu-get #api-client #rollback
  36. robotstxt

    A native Rust port of Google's robots.txt parser and matcher C++ library

    v0.3.0 5.5K #web-crawler #parser
  37. wdict

    Create dictionaries by scraping webpages or crawling local files

    v0.1.22 #web-crawler #web-page #dictionary #local #word-list #web-scraping
  38. crw-extract

    HTML extraction and markdown conversion engine for the CRW web scraper

    v0.8.3 #web-scraping #web-crawler #firecrawl #mcp #llm
  39. yps

    Yggdrasil Port Scanner

    v0.1.3 180 #tcp #yggdrasil #udp #web-crawler #search
  40. iscrawl

    Fast crawler/bot detection from User-Agent strings

    v1.2.0 #web-crawler #user-agent #bot #detection
  41. firecrawl-sdk

    Rust SDK for Firecrawl API

    v0.4.0 #sdk #firecrawl #web-crawler #api #batch-scrape #web-api
  42. website_crawler

    gRPC tokio based web crawler built with spider

    v0.9.9 10K #web-indexer #web-crawler #site-map-generator #crawler
  43. spider_chromiumoxide_fetcher

    Contains a chromium and chrome for testing fetcher

    v0.8.1 #headless-chrome #chromiumoxide #fetcher #web-crawler #cdp #cache #chromey #dev-tools #automation
  44. zapfetch

    Official Rust SDK for the ZapFetch v2 web scraping API

    v0.1.0 #sdk #web-crawler #web-scraping #format #llm-agent #api-url #batch-scrape #self-hosted #web-sdk #web-search
  45. toai

    path crawler, that copies all SRC files into a singe output to send it to a ai (toai)

    v0.2.0 #artificial-intelligence #path #stdout #src #ignore #web-crawler #relative-path
  46. dig2crawl

    Universal agnostic web crawler with Claude-powered CSS selector discovery

    v0.3.21 #css-selectors #web-crawler #claude #json-ld #json-path #site-profile #spa #captcha #screenshot #fingerprint
  47. crawn

    web crawling and scraping

    v0.3.0 #web-scraping #web-crawler #async #concurrency #cli-concurrency
  48. product-os-crawler

    Product OS : Crawler is a browser based cralwer that utilises Product OS : Browser to perform advanced url crawling leveraging headless browsing and automation

    v0.0.16 450 #product-os #web-crawler
  49. argus-common

    Common types and utilities for the Argus web crawler

    v0.1.0 #web-crawler #web #type #crawler
  50. tarzi

    Rust-native lite search for AI applications

    v0.1.10 #web-crawler #rag
  51. yagami

    A concurrent sitemap crawler that validates web links and exports results to CSV

    v0.1.0 #sitemap #web-crawler #link-checker #seo
  52. spider_cli

    The fastest web crawler CLI written in Rust

    v2.51.190 #web-crawler #web-scraping
  53. halo-spider

    A fast, async web crawling framework inspired by Scrapy, built in Rust

    v0.0.5 #web-scraping #async #web-crawler #web #scraper
  54. spider_scraper

    A css scraper using html5ever

    v0.2.1 2.9K #web-scraping #css-selectors #html-parser #serialization #element-attributes #web-crawler
  55. argus-storage

    Storage backends for crawled web data

    v0.1.0 #web-crawler #amazon-s3 #storage #crawler
  56. spider_utils

    Spider Web Crawler

    v2.51.190 410 #web-crawler #css #crawler
  57. hnu_query

    querying systems of Hunan University

    v0.3.0 #hnu #querying #university #system #md #web-crawler
  58. rust_scraper

    Production-ready web scraper with Clean Architecture, TUI selector, and sitemap support

    v1.0.0 #web-scraping #web-crawler #rag #tui #web
  59. hazler-js-parser

    JavaScript endpoint parser for Hazler

    v0.2.0 #web-crawler #endpoint #hazler #javascript-parser #authentication #source-map #next-generation #headless-chrome #cookies
  60. pulsarss

    RSS Aggregator for Gemini Protocol

    v0.1.4 180 #gemini-protocol #rss #web-crawler #gemini-gemtext
  61. spider_remote_cache

    Shared remote cache upload worker for spider and chromey

    v0.2.0 #web-crawler #remote #cache #chromey
  62. json-crawler

    Wrapper for serde_json that provides nicer errors when crawling through large json files

    v0.1.0 190 #serde-json #web-crawler #json-error #youtube-music #pointers
  63. spider-pipeline

    Pipeline implementations for the spider-lib web scraping framework

    v0.3.9 #pipeline #web-scraping #web-crawler #rust #scraper
  64. actix_block_ai_crawling

    Actix Middleware that blocks Generative AI crawlers

    v0.2.11 #artificial-intelligence #web-crawler #generative-ai #block #actix-middleware
  65. doclinks

    Discover and export documentation links from docs sites

    v0.1.0 #web-crawler #documentation #search #documentation-search
  66. spider-util

    Shared utility functions and types for the spider-lib ecosystem

    v0.3.7 #spider-lib #shared #ecosystem #workspace #utility #web-crawler #web-scraping #author
  67. netiquette

    A rate-limiter for politely crawling the Web

    v0.1.1 #rate-limiting #web-crawler #internet #http-request #server #honor #user-agent #web-scraping
  68. robotxt

    Robots.txt (or URL exclusion) protocol with the support of crawl-delay, sitemap and universal match extensions

    v0.6.1 900 #web-crawler #web-framework #scraper
  69. wrake

    Collect links from the given URL

    v0.4.2 #web-crawler #web #crawler
  70. argus-config

    Configuration management for the Argus web crawler

    v0.1.0 #web-crawler #settings #config #crawler
  71. scrapfly-sdk

    Async Rust client for the Scrapfly web scraping, screenshot, extraction and crawler APIs

    v0.2.4 #web-crawler #screenshot #web-scraping #scrapfly
  72. pixivdwn

    Incremental pixiv crawler/downloader

    v0.2.0-alpha1 #downloader #pixiv #web-crawler #incremental #database #fetched #illustrations
  73. anakin-sdk

    Official Rust SDK for the Anakin web-scraping API. Scrape, crawl, search, and run Wire actions with internal job polling.

    v0.1.0 #web-crawler #sdk #web-scraping #anakin #web-api
  74. halldyll-core

    Core scraping engine for Halldyll - high-performance async web scraper for AI agents

    v0.1.0 #ai-agent #web-scraping #deployment #web-crawler
  75. spider-downloader

    Downloader component for the spider-lib web scraping framework

    v1.0.5 #web-framework #spider-lib #web-scraping #downloader #execution #web-crawler
  76. hazler-http

    HTTP client utilities for the hazler web crawling framework

    v0.2.0 #web-crawler #web-scraping #authentication #hazler #http #cookies #api-key #oauth2 #chrome #forms
  77. gnostr-web

    gnostr web and JS bindings crate

    v948230.1875.177928 #nostr #js-web #gnostr-a #js-bindings #query #git-nostr #bencode #web-crawler #async-git #upnp
  78. seaward

    grep-like tool for the web

    v1.1.0 290 #web-crawler #rustcrawler #cli
  79. argus-parser

    HTML and sitemap parsing utilities for web crawling

    v0.1.0 #web-crawler #sitemap #html-parser
  80. pkuinfo-spider

    基于微信公众号后台超链接查找的文章爬虫 CLI

    v0.1.3 #web-crawler #we-chat #crawler
  81. crawly

    A lightweight async Web crawler in Rust, optimized for concurrent scraping while respecting robots.txt rules

    v0.1.9 #web-crawler #robots-txt #web-scraping #rate-limiting #builder-pattern #concurrency #respecting #depth-first-search
  82. spider-middleware

    Middleware implementations for the spider-lib web scraping framework

    v0.3.8 #middleware #web-scraping #web-crawler #rust
  83. qrawl

    Rust toolkit to crawl web data for AI agents

    v0.6.0 #web-scraping #web-crawler #rag
  84. scrapling-spider

    Concurrent web crawler framework for scrapling

    v0.2.0 #web-scraping #robots-txt #session-manager #framework #concurrency #web-crawler #cache #checkpoint #pause-resume #priority-queue
  85. headless_chrome_fork

    Control Chrome programatically

    v1.0.2 #headless-chrome #dev-tools #puppeteer #headless-browser #web-scraping #web-crawler #fetching
  86. inverta

    A basic search engine that downloads pages and matches your search query against their contents

    v0.1.0 #search-query #search-engine #web-crawler #web-page #tf-idf #stop-words
  87. voyager

    Web crawler and scraper

    v0.2.1 #web-crawler #web-scraping #state-machine
  88. capp

    Common things i use to build Rust CLI tools for web crawlers

    v0.4.3 650 #web-crawler #async-executor #async
  89. ferrisfetcher

    A cutting-edge, high-level web scraping library crafted in Rust

    v0.1.0 #web-scraping #web-crawler #scraper
  90. crawlurls

    A fast async Rust crawler that discovers and filters URLs by pattern without scraping content

    v0.1.1 #web-crawler #regex #url #web #rust
  91. ares-core

    Core types, traits, and error handling for Ares AI scraper

    v0.3.0 #artificial-intelligence #web-scraping #job-queue #json-schema #rate-limiting #circuit-breaker #llm #web-crawler #fetcher #cleaner
  92. scoutlang

    A web crawling programming language

    v0.7.2 370 #web-crawler #web-scraping #web-crawling #programming-language
  93. omnivore-cli

    Universal web scraper and code extractor CLI - crawl websites, analyze repositories, build knowledge graphs

    v0.2.0 #git #web-crawler #code-analysis #web-scraping
  94. minisearchtk

    Small toolkit for crawling and searching web pages

    v0.2.0 #web-crawler #hash #web-page #search #toolkit #cache #user-agent #nofollow #respect #statistics
  95. iocaine-label

    The deadliest poison known to AI

    v3.4.0 #iocaine #reverse-proxy #poison #artificial-intelligence #web-crawler #web-scraping #bot #scripting-engine #incoming-request
  96. shader-prepper

    Shader include parser and crawler

    v0.3.0-pre.3 #shader #shader-compiler #web-crawler #string #provider
  97. iocaine-table

    The deadliest poison known to AI

    v3.4.0 #iocaine #poison #artificial-intelligence #web-scraping #reverse-proxy #web-crawler #scripting-engine #incoming-request
  98. mitsuba

    Lightweight 4chan board archive software (like Foolfuuka), in Rust

    v1.10.0 #downloader #archive #web-crawler #web-archive
  99. hazler-parser

    HTML parser for extracting links and endpoints for the Hazler web crawler

    v0.2.0 #web-crawler #graphql #endpoint #hazler #html-parser #parser-for-extracting #next-generation #graphql-introspection #forms #headless-chrome
  100. coma

    lightweight command-line tool designed for crawling websites

    v0.2.3 310 #web-crawler #web-discovery #web-scraping
  101. recursive_scraper

    Constant-frequency recursive CLI web scraper with frequency, filtering, file directory, and many other options for scraping HTML, images and other files

    v0.6.2 #web-scraping #recursion #web-crawler #web #scraper
  102. unobtanium-crawler

    The default web-crawler for unobtanium

    v3.0.0 #web-crawler #unobtanium #index #search-engine
  103. turboscraper

    A high-performance, concurrent web scraping framework for Rust with built-in support for retries, storage backends, and concurrent request handling

    v0.1.1 190 #web-scraping #web-crawler #async #web
  104. unobtanium

    Opinioated Web search engine library with crawler and viewer companion

    v3.0.0 #search-engine #web-crawler #web-search #database #shared-data-structures
  105. spidermedic

    A CLI tool for validating pages while crawling a website

    v0.2.0 #web-crawler #validation #page #command-line-tool #url #concurrency #http-errors
  106. scout-parser

    A web crawling programming language

    v0.7.2 430 #web-crawler #web-scraping #web-crawling #programming-language
  107. hashtree-nostr-bridge

    Relay bridge and social-graph-scoped crawler for hashtree Nostr indexes

    v0.2.15 #nostr #web-crawler #hash-tree #bridge #index
  108. scout-lexer

    A web crawling programming language

    v0.7.2 430 #web-crawler #web-scraping #web-crawling #programming-language
  109. chan-downloader

    CLI to download all images/webms of a 4chan thread

    v0.3.0 #download #web-crawler #4chan #4plebs #cli
  110. firecrawl_rs

    Rust SDK for Firecrawl API

    v0.1.1 #web-crawler #sdk #structured-data #llm #api-sdk #markdown #web-data
  111. indexea

    OpenAPI of Indexea

    v1.0.0 #oauth #widgets #payment #apps-api #delete-account #account-api #search-api #invoice #logging #web-crawler
  112. website_crawler_sdk

    The official Rust SDK for the Website Crawler API

    v1.0.0 #web-crawler #json #llm #api
  113. frangipani

    Scraping framework for rust

    v0.3.1 #web-scraping #continuous-crawler #web-crawler #scraper #scraping
  114. async_job

    async cron job crate for Rust

    v0.1.4 1.8K #cron-job #web-crawler #crawler
  115. aquatic-crawler

    Crawler tool for the Aquatic BitTorrent tracker API

    v0.1.0 #bittorrent #web-crawler #parser #magnet #aquatic
  116. quick_crawler

    QuickCrawler is a Rust crate that provides a completely async, declarative web crawler with domain-specific request rate-limiting built-in

    v0.1.2 #web-crawler #rate-limiting #domain-specific #web-scraping #web-page
  117. brchd

    Data exfiltration toolkit

    v0.1.0 #toolkit #exfiltration #upload #uploader #0-1 #web-crawler
  118. spacebar

    An anti-plagiarism tool based on null width characters

    v0.3.0-rc1 #character #database #web-crawler #clipboard #tool #web-scraping #blog #http-errors #blog-post
  119. vil_crawler

    VIL Web Crawler — async concurrent BFS crawling with robots.txt support (I01)

    v0.4.0 #web-crawler #async-concurrency #vil #bfs #robots-txt #distributed-systems #web-scraping #language-framework #zero-copy
  120. maman

    Rust Web Crawler

    v0.13.1 #web-crawler #web #http #crawler
  121. Try searching with DuckDuckGo.

  122. rsfile

    operate files or web pages easily and quickly

    v0.1.2 #web-crawler #file-utility #web-page #text-file #web-page-helper #text-file-helper #csv-file-helper #crawler
  123. ntwrk

    TODO

    v0.1.1 #browser-automation #web-crawler #debugging #web-scraping #debugging-tool
  124. arkenar-core

    Core library for the Arkenar vulnerability scanner

    v1.2.0 #scanner #scan #vulnerabilities #vulnerabilities-scanner #web-crawler #nuclei #katana #authentication #bug-bounty #recon
  125. jsdom

    javascript dom parser for web scraping

    v0.0.11-alpha.1 120 #web-scraping #web-crawler
  126. finde-rs

    Multi-threaded filesystem crawler

    v0.1.4 #web-crawler #thread-pool #channel #cli
  127. ptt-crawler

    A crawler for the web version of PTT, the largest online community in Taiwan

    v0.1.0 #web-crawler #ptt #crawler
  128. s5_importer_http

    HTTP importer for S5

    v1.0.0-beta.1 #importer #s5 #base-url #http #web-crawler #parallel-processing #content-length
  129. spidery

    Rust SDK for Spidery API

    v1.0.0 #sdk #web-crawler #llm #format #crawl #scrape #data-url
  130. surly-spider

    A command line interface for crawling websites

    v1.0.2 #command-line-interface #web-crawler #surly #domain #flags
  131. web-crawler

    Finds every page, image, and script on a website (and downloads it)

    v0.1.3 #download #web-page #find #image #script
  132. url-crawl

    URL crawler for HTML code

    v0.2.0 #kvarn #url #web-crawler #crawl #push #web-server
  133. krate

    Get information and metadata for published Rust crates

    v1.0.0 #metadata #io-api #contract #data-model #web-crawler
  134. stream_crawler

    scraping web pages and extracting URLs and endpoints

    v0.1.1 #web-crawler #web-scraping #endpoint #web
  135. omnivore-core

    Core crawler and knowledge graph engine for Omnivore - web scraping, AI extraction, browser automation

    v0.1.1 #web-crawler #knowledge-graph #browser #async #web-scraping
  136. actix-prerender

    Actix middleware that sends requests to Prerender.io or a custom Prerender service URL

    v0.2.4 #web-crawler #service-url #prerender #send #io #actix-web #user-agent #actix-middleware
  137. rust-rock-rover

    Concert web crawler in Rust

    v0.1.0 #concert #web #web-crawler #cargo-generate #template #git #ci
  138. od-get

    recursively crawling & downloading data from open directories

    v0.3.1 #download #web-crawler #open-directory #recursion-depth #file-pattern #verbosity
  139. kodict

    Korean Dictionary Implements and Crawler for Rust

    v0.2.1 #dictionary #korean #web-crawler #hangul
  140. doublesite

    Alternative for httrack

    v0.1.0 #content #httrack #loading #website #cli #backup #har #mirroring #web-crawler
  141. crawl

    Rust crawl

    v0.2.1 #web-crawler #http #spider
  142. wls

    Easily crawl multiple sitemaps and list URLs

    v0.1.0 #sitemap #web-crawler #url
  143. lolchive

    local liminal archiver for webpages

    v0.2.0 #web-page #archiver #local #liminal #web-crawler #date
  144. source-demo-tool-crawler

    WIP: a gui tool for opening (editing planned) source engine demo files

    v0.8.2 #demo-file #source-engine #web-crawler #tool #editing #changelog #file-content
  145. ssufid

    SSU Announcement Crawler for Everyone

    v0.1.0 #ssu #web-crawler #announcement
  146. spire

    The flexible scraper framework powered by tokio and tower

    v0.1.0 #web-framework #web-crawler #scraper
  147. emails

    A web scraper to extract email addresses from websites

    v1.0.0 #email #web-crawler #web
  148. sws-crawler

    Web crawler with plugable scraping logic

    v0.1.0 #web-crawler #web-scraping-logic #sws #sitemap #seed #plugable #scrap #web-page
  149. scraper_query

    Ergonomic Query for HTML with Scraper

    v0.4.0 200 #web-scraping #query #html #document #class #web-crawler
  150. flatcrawl-crawler

    set of webpage crawlers. New crawlers can be easily configured and the output can be written to an AMQP queue.

    v1.0.0 #amqp #web-crawler #web-scraping #flatcrawl #flats #web-page
  151. task_deport

    Organize simple task queue

    v0.1.0 #task-queue #redis #in-memory-storage #processing #web-crawler #redis-queue #health-check #health-monitoring #concurrency
  152. gar-crawl

    High level HTML crawler with concise builder

    v0.1.16 #web-crawler #high #level #propagator #builder #allow-list
  153. ac_crawler_types

    normalized types for the anti capital public data crawlers

    v0.1.5 #web-crawler #capital #public #normalized #anti