This repository is used to collect information that can be used to categorize & match traffic.
We will auto-generate full lists in plaintext and JSON later on!
For traffic-filtering rulesets it is essential to categorize requests.
This can allow you to easily restrict traffic from client-categories you do not want.
TLDR; What can you expect:
- Detecting different kinds of bots
- Validating crawler-bots - detecting spoofed crawlers
- Categorizing the source-networks
To transparently match & categorize bots we need to combine:
-
Traffic Matches
-
Matching the source-IP with IP- or ASN-Lists
- Separating different kinds of bots by their HTTP User-Agent (if they use the same IP-range)
- Categorizing the source-networks into Hosting/ISP/Education/Cloud/CDN/VPN/Proxy/Scanner/CGNAT
-
Separating different bot-categories like:
script bots,hidden bots,search-engine crawlers,AI-data crawlers,AI-user crawlers,social-media crawlers,crawlers for ADs,crawlers for ecommerce, and so on- Matching clear script-bots by their User-Agent (dumb script-kiddies)
- Matching 'hidden' bots by their client-fingerprints (JA4, etc.)
- ... to be extended ...
-
-
PTR-checks
- Some organizations only supply us with a PTR-match to validate if a crawler-IP is theirs (no simple IP-list lookups)
-
Traffic Flagging
- We provide you with abstract configuration that shows how the matches can be combined
- Practical configuration examples for proxy-services will be added later on
See: Downloader README
IMPORTANT: Do not run the downloader (and thus download the IP lists) frequently! Once a day (or even once a week) is sufficient for most systems.
See: Log-Flagger README
Contributions are very welcome.
If you:
- know of official IP-Lists we missed
- found other missing/incorrect information
..feel free to either open a ticket or email us directly
With our IP-Abuse Reporting-System & Databases we have started to collect information of abusers.
As the mindset of Open-Source is at the core of our being - we want to transparently share it with the whole world.
The Open-Bot-List data-collection uses the BSD 3-Clause license and has very little restrictions.
The Applications for processing (downloader/log-flagger) use the GPLv3 license.