Skip to content

O-X-L/open-bot-list

Open Bot List

Support Badge (Donate, Support-Licenses)

Lint CSV Configs Lint Downloader Unit Test Downloader Functional Tests for Downloader Functional Tests for Log-Flagger

This repository is used to collect information that can be used to categorize & match traffic.

We will auto-generate full lists in plaintext and JSON later on!


Why

For traffic-filtering rulesets it is essential to categorize requests.

This can allow you to easily restrict traffic from client-categories you do not want.

TLDR; What can you expect:

  • Detecting different kinds of bots
  • Validating crawler-bots - detecting spoofed crawlers
  • Categorizing the source-networks

How it works

To transparently match & categorize bots we need to combine:

  • Traffic Matches

    • Matching the source-IP with IP- or ASN-Lists

      • Separating different kinds of bots by their HTTP User-Agent (if they use the same IP-range)
      • Categorizing the source-networks into Hosting/ISP/Education/Cloud/CDN/VPN/Proxy/Scanner/CGNAT
    • Separating different bot-categories like: script bots, hidden bots, search-engine crawlers, AI-data crawlers, AI-user crawlers, social-media crawlers, crawlers for ADs, crawlers for ecommerce, and so on

      • Matching clear script-bots by their User-Agent (dumb script-kiddies)
      • Matching 'hidden' bots by their client-fingerprints (JA4, etc.)
      • ... to be extended ...
  • PTR-checks

    • Some organizations only supply us with a PTR-match to validate if a crawler-IP is theirs (no simple IP-list lookups)
  • Traffic Flagging

    • We provide you with abstract configuration that shows how the matches can be combined
    • Practical configuration examples for proxy-services will be added later on

Downloader Application

See: Downloader README

IMPORTANT: Do not run the downloader (and thus download the IP lists) frequently! Once a day (or even once a week) is sufficient for most systems.


Log-Flagger Application

See: Log-Flagger README


Contribute

Contributions are very welcome.

If you:

  • know of official IP-Lists we missed
  • found other missing/incorrect information

..feel free to either open a ticket or email us directly


Motivation

With our IP-Abuse Reporting-System & Databases we have started to collect information of abusers.

As the mindset of Open-Source is at the core of our being - we want to transparently share it with the whole world.


License

The Open-Bot-List data-collection uses the BSD 3-Clause license and has very little restrictions.

The Applications for processing (downloader/log-flagger) use the GPLv3 license.

Sponsor this project

Contributors