Skip to content

Conversation

@swhite2
Copy link
Member

@swhite2 swhite2 commented Oct 20, 2025

TODO:

  • Discuss migration strategy, including post() usage to clean up config.
  • Improve domain tester policy matching for better traceability.

@swhite2 swhite2 self-assigned this Oct 20, 2025
Copy link
Member

@fichtner fichtner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good on the surface

@swhite2
Copy link
Member Author

swhite2 commented Oct 21, 2025

One thing to note here is that with many blocklists, the matching algorithm has quite a performance penalty due to the subnet coverage calculation. This doesn't necessarily have to be a problem, as it's properly locked and is only for pre-processing, but it's slower than before.

@swhite2
Copy link
Member Author

swhite2 commented Oct 23, 2025

Just to ease review a bit, the data path is as follows:

  • Input:
[blocklist:1]
description=default
rcode=NOERROR
address=0.0.0.0
source_nets=
cache_ttl=72000
excludes.default=.*localhost$
excludes.patterns=
includes.patterns=
includes.wildcards=

[blocklist:2]
description=<second>
rcode=NOERROR
address=0.0.0.0
source_nets=192.168.2.0/24,192.168.1.0/24
cache_ttl=72000
blocklists.<shortcode>=<URL> <-- only exists if one or more types set
blocklists.custom<loop.index>=<URL> <-- only exists if one or more lists set
excludes.default=.*localhost$
excludes.patterns=
includes.patterns=
includes.wildcards=
  • Output is a single JSON file with the following structure:
{
	"data": {
		"<domain>": [
			{"idx": <index into the first logical policy it should check},
			{"idx": ...}
			...
		],
	},
	"config": {
		"<idx>": {
			"passlist": <compiled passlist>,
			"source_nets": [
				"192.168.1.0/24",
				...
			]
			...
		}
	}
}
  • There may be multiple policies associated with a single domain, as source nets can differ, as well as the allowlist (passlist here) regex.
  • Config output is used directly in dnsbl_module.py and is returned when queried in test mode.
  • To keep the resulting output small, the indices are simple integers - we assume other blocklist handlers may be injected, therefore the index must be managed by the BlocklistParser class.
    • We first merge all configurations, separated by index
    • The handlers return their blocklists, which is an array of dictionaries, all of which may contain domains that overlap in the final blocklist.
    • _merge_items_with_config merges the blocklists together and decides in what order the policies are attached to the domains -> more specific subnets have priority, and in that set the bigger passlists also have priority.

AdSchellevis added a commit that referenced this pull request Oct 25, 2025
part 1: simplify blocklist handling.

Introduce blocklists_iter() in BaseBlocklistHandler class which yields domain information per policy, which can then be reordered easily using the calculated priorities using source_nets in get_policies().

To prevent people from using overlapping nets or different protocol families in the same policy, we validate the input. Overlapping nets will lead to possible race conditions while matching as we cannot assume the first match found is actually the relevant one anymore.

Our output data should match the expected format below:

```
{
	"data": {
		"<domain>": [
			{"idx": <index into the first logical policy it should check},
			{"idx": ...}
			...
		],
	},
	"config": {
		"<idx>": {
			"passlist": <compiled passlist>,
			"source_nets": [
				"192.168.1.0/24",
				...
			]
			...
		}
	}
}
```

This part only concers the parser with backwards compatibility (to keep QFeeds functional during the process)

In the blocklist.conf template we changed the sequence index to use UUID's, which basically does the same thing as the index number, but is slightly better to track (used in the dnsbl.json output file as well).
AdSchellevis added a commit that referenced this pull request Oct 26, 2025
part 1: simplify blocklist handling.

Introduce blocklists_iter() in BaseBlocklistHandler class which yields domain information per policy, which can then be reordered easily using the calculated priorities using source_nets in get_policies().

To prevent people from using overlapping nets or different protocol families in the same policy, we validate the input. Overlapping nets will lead to possible race conditions while matching as we cannot assume the first match found is actually the relevant one anymore.

Our output data should match the expected format below:

```
{
	"data": {
		"<domain>": [
			{"idx": <index into the first logical policy it should check},
			{"idx": ...}
			...
		],
	},
	"config": {
		"<idx>": {
			"passlist": <compiled passlist>,
			"source_nets": [
				"192.168.1.0/24",
				...
			]
			...
		}
	}
}
```

This part only concers the parser with backwards compatibility (to keep QFeeds functional during the process)

In the blocklist.conf template we changed the sequence index to use UUID's, which basically does the same thing as the index number, but is slightly better to track (used in the dnsbl.json output file as well).
AdSchellevis added a commit that referenced this pull request Oct 26, 2025
Reorganize some parts so we can split this up more easily into separate components and fix one minor passlist regex issue.
AdSchellevis added a commit that referenced this pull request Oct 26, 2025
Setup new unbound-dnsbl module which should replace dnsbl_module.py and tester (work in progress).

In its current state this should contain all relevant classes and functions currently used in dnsbl_module.py with rudimentary unit tests.
AdSchellevis added a commit that referenced this pull request Oct 27, 2025
move dnsbl_module into unbound-dnsbl directory and use library functions.
AdSchellevis added a commit that referenced this pull request Oct 27, 2025
move dnsbl_module into unbound-dnsbl directory and use library functions.
AdSchellevis added a commit that referenced this pull request Oct 27, 2025
As getNodeContent() returns descriptive values as well (starting with %), there is no guarantee $bl->$key exists in the migration, check before calling applyDefault()
AdSchellevis added a commit that referenced this pull request Oct 27, 2025
fix some scoping issues when dnsbl_module is used from unbound.
AdSchellevis added a commit that referenced this pull request Oct 27, 2025
use /usr/local/opnsense/scripts/unbound-dnsbl/dnsbl_match.py for the tester.
swhite2 and others added 11 commits October 27, 2025 20:43
part 1: simplify blocklist handling.

Introduce blocklists_iter() in BaseBlocklistHandler class which yields domain information per policy, which can then be reordered easily using the calculated priorities using source_nets in get_policies().

To prevent people from using overlapping nets or different protocol families in the same policy, we validate the input. Overlapping nets will lead to possible race conditions while matching as we cannot assume the first match found is actually the relevant one anymore.

Our output data should match the expected format below:

```
{
	"data": {
		"<domain>": [
			{"idx": <index into the first logical policy it should check},
			{"idx": ...}
			...
		],
	},
	"config": {
		"<idx>": {
			"passlist": <compiled passlist>,
			"source_nets": [
				"192.168.1.0/24",
				...
			]
			...
		}
	}
}
```

This part only concers the parser with backwards compatibility (to keep QFeeds functional during the process)

In the blocklist.conf template we changed the sequence index to use UUID's, which basically does the same thing as the index number, but is slightly better to track (used in the dnsbl.json output file as well).
Reorganize some parts so we can split this up more easily into separate components and fix one minor passlist regex issue.
Setup new unbound-dnsbl module which should replace dnsbl_module.py and tester (work in progress).

In its current state this should contain all relevant classes and functions currently used in dnsbl_module.py with rudimentary unit tests.
move dnsbl_module into unbound-dnsbl directory and use library functions.
As getNodeContent() returns descriptive values as well (starting with %), there is no guarantee $bl->$key exists in the migration, check before calling applyDefault()
fix some scoping issues when dnsbl_module is used from unbound.
use /usr/local/opnsense/scripts/unbound-dnsbl/dnsbl_match.py for the tester.
@AdSchellevis
Copy link
Member

My changes are in, but can use an extra pair of eyes on the current state.

@AdSchellevis AdSchellevis requested a review from fichtner October 27, 2025 20:18
foreach ($dirs as $dir) {
mwexecf('/bin/mkdir -p %s', "/var/unbound{$dir}");
}
/* deploy python module (without test suite) */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* deploy python module (without test suite) */
/* deploy python module (without test suite) */

);
if (count($sizes) > 1) {
$messages->appendMessage(new Message(
gettext("All offered networks should be equally sized to avoid overlaps"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
gettext("All offered networks should be equally sized to avoid overlaps"),
gettext('All offered networks should be equally sized to avoid overlaps.'),

Mostly about missing dot in sentence.

}
if (count($ipproto) > 1) {
$messages->appendMessage(new Message(
gettext("All offered networks should use the same IP protocol"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
gettext("All offered networks should use the same IP protocol"),
gettext('All offered networks should use the same IP protocol.'),

@swhite2
Copy link
Member Author

swhite2 commented Oct 28, 2025

Apart from the extra review, points of interest here are:

  • We need to make sure we skip validation for items coming from the extended blocklists that do not fit the new validations.
  • The dnsbl.json file needs to be regenerated at some point. Likely needs a simple check in the Unbound code to reject the old list.

foreach (new RecursiveIteratorIterator(new RecursiveDirectoryIterator($basedir)) as $filename) {
$target = "/var/unbound/unbound-dnsbl/".substr($filename, strlen($basedir));
if (str_ends_with($filename, '.py') && is_dir(dirname($target))) {
@copy($filename, $target);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we're making sure to clean /var/unbound/unbound-dnsbl so our future selves will be happy?

Copy link
Member

@fichtner fichtner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks ok, leaving the python parts for the other avid readers :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants