dnstapir-edm reads DNSTAP and depending on configuration can output some different
data based on the observed messages:
- DNS queries for names considered well-known will be summarised into histograms which are saved as parquet files. These files will then be submitted to Core.
- DNS queries for names not considered well-known are collected into other parquet files for further local analysis and here the complete message content is saved but the client and server IP-addresses are pseudonymised via Crypto-PAn.
- DNS queries that are not considered well-known and have never been seen
before by a given instance of
dnstapir-edmwill result in notifications being sent to Core via MQTT messages.
Running dnstapir-edm requires the creation of a TOML config file for holding the
crypto-PAn secret used for pseudonymisation as well as a
well-known-domains.dawg file which can be created using dnstapir-cli from
https://github.com/dnstapir/cli
A basic setup where dnstapir-edm will listen on a unix socket for DNSTAP data and
output files to a directory structure under /tmp/dnstapir-edm but not send anything to
Core can be created like this:
make build
echo 'cryptopan-key = "mysecret"' > dnstapir-edm.toml
curl -O https://www.domcop.com/files/top/top10milliondomains.csv.zip
unzip top10milliondomains.csv.zip
dnstapir-cli dawg --standalone compile --format csv --src top10milliondomains.csv --dawg well-known-domains.dawg
dnstapir-edm run --input-unix /tmp/dnstapir-edm/input.sock --data-dir /tmp/dnstapir-edm/data --config-file dnstapir-edm.toml --well-known-domains-file well-known-domains.dawg --disable-mqtt --disable-histogram-sender
Since all communication with Core is disabled this is helpful for creating some local parquet files to look around in.
A running dnstapir-edm reloads its configuration on SIGHUP (e.g.
systemctl reload dnstapir-edm or kill -HUP <pid>). One signal re-reads the
config file and re-applies all reloadable state derived from files it points
at: the Crypto-PAn key material, the ignored client IPs and ignored question
names lists, the MQTT/HTTP client certificates and the well-known-domains
DAWG file. The DAWG swap takes effect at the next histogram rotation (within
a minute) since the collected histogram data is tied to the DAWG it was built
against. A reload that fails to read a file logs an error and keeps the
previous state. Changes to config keys that are not reloadable are logged with
a warning saying a restart is required.
Updating a DAWG is safe while the service runs. dnstapir-edm copies each
memory-mapped DAWG (well-known-domains-file, ignored-question-names-file)
into a private dawg-staging directory under data-dir and memory-maps that
copy, so overwriting the source file — in place or by atomic rename — cannot
disturb the live mapping or crash the service. Send SIGHUP once the new file
is completely written; a signal received mid-write makes that one reload fail
and keep the previous DAWG, so re-send it after the write finishes.
For inspecting the content you can use e.g. DuckDB like so:
- For summarised histogram data
duckdb -c 'select * from "/tmp/dnstapir-edm/data/parquet/histograms/outbox/dns_histogram-2024-09-26T18-14-00Z_2024-09-26T18-15-00Z.parquet"'
- For pseudonymised session (full message) data
duckdb -c 'select * from "/tmp/dnstapir-edm/data/parquet/sessions/dns_session_block-2024-09-26T18-18-00Z_2024-09-26T18-19-00Z.parquet"'
Next to the parquet directory you will also see a directory called "pebble".
This is where dnstapir-edm keeps its key-value store which is used to tell if a
query name has been seen before or not. The key-value store being used is
pebble.
dnstapir-edm exposes prometheus metrics at 127.0.0.1:2112
and go pprof profiling data at 127.0.0.1:6060.
To look at prometheus metrics:
curl 127.0.0.1:2112/metrics
There are multiple types of profiling data available, here is a CPU-centric example:
go tool pprof http://127.0.0.1:6060/debug/pprof/profile?seconds=30
When working with this code at least the following tools are expected to be run at the top level directory prior to commiting:
gofumpt -l -w .(see gofumpt)go vet ./...staticcheck ./...(see staticcheck)gosec ./...(see gosec)golangci-lint run(see golangci-lint)go test -race ./...
The most simple way of getting the dnstapir-edm binary is this:
make build