爬虫
WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes.
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
A scalable web crawler framework for Java.
Scrapy, a fast high-level web crawling & scraping framework for Python.
A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。
python爬虫教程系列、从0到1学习python爬虫,包括浏览器抓包,手机APP抓包,如 fiddler、mitmproxy,各种爬虫涉及的模块的使用,如:requests、beautifulSoup、selenium、appium、scrapy等,以及IP代理,验证码识别,Mysql,MongoDB数据库的python使用,多线程多进程爬虫的使用,css 爬虫加密逆向破解,JS爬虫逆向,…
Elegant Scraper and Crawler Framework for Golang
A Powerful Spider(Web Crawler) System in Python.
A browser automation framework and ecosystem.
Automated driver management and other helper features for Selenium WebDriver in Java
Custom Selenium Chromedriver for Java can pass almost all selenium check. It's the Java version for undetected-chromedriver
Open Source record and playback test automation for the web.
Headless chrome/chromium automation library (unofficial port of puppeteer)
JavaScript API for Chrome and Firefox
Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
Provides a simple way to run Selenium Grid with Chrome, Firefox, and Edge using Container Platform, making it easier to perform browser automation at scale
Java version of the Playwright testing and automation library
Python based web automation tool. Powerful and elegant.
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
这是一个图像识别的java项目,底层是ddddocr(从gitee上fork过来并改动了)运行生成的jar即可
python爬虫项目合集,从基础到js逆向,包含基础篇、自动化篇、进阶篇以及验证码篇。案例涵盖各大网站(xhs douyin weibo ins boss job,jd...),你将会学到有关爬虫以及反爬虫、自动化和验证码的各方面知识