Skip to content

dnstapir/edm

Repository files navigation

build coverage goreport OpenSSF Scorecard

edm: Edge DNSTAP Minimiser

About

dnstapir-edm reads DNSTAP and depending on configuration can output some different data based on the observed messages:

  • DNS queries for names considered well-known will be summarised into histograms which are saved as parquet files. These files will then be submitted to Core.
  • DNS queries for names not considered well-known are collected into other parquet files for further local analysis and here the complete message content is saved but the client and server IP-addresses are pseudonymised via Crypto-PAn.
  • DNS queries that are not considered well-known and have never been seen before by a given instance of dnstapir-edm will result in notifications being sent to Core via MQTT messages.

Usage

Running dnstapir-edm requires the creation of a TOML config file for holding the crypto-PAn secret used for pseudonymisation as well as a well-known-domains.dawg file which can be created using dnstapir-cli from https://github.com/dnstapir/cli

Steps for a basic local-only setup

A basic setup where dnstapir-edm will listen on a unix socket for DNSTAP data and output files to a directory structure under /tmp/dnstapir-edm but not send anything to Core can be created like this:

make build
echo 'cryptopan-key = "mysecret"' > dnstapir-edm.toml
curl -O https://www.domcop.com/files/top/top10milliondomains.csv.zip
unzip top10milliondomains.csv.zip
dnstapir-cli dawg --standalone compile --format csv --src top10milliondomains.csv --dawg well-known-domains.dawg
dnstapir-edm run --input-unix /tmp/dnstapir-edm/input.sock --data-dir /tmp/dnstapir-edm/data --config-file dnstapir-edm.toml --well-known-domains-file well-known-domains.dawg --disable-mqtt --disable-histogram-sender

Since all communication with Core is disabled this is helpful for creating some local parquet files to look around in.

Reloading configuration

A running dnstapir-edm reloads its configuration on SIGHUP (e.g. systemctl reload dnstapir-edm or kill -HUP <pid>). One signal re-reads the config file and re-applies all reloadable state derived from files it points at: the Crypto-PAn key material, the ignored client IPs and ignored question names lists, the MQTT/HTTP client certificates and the well-known-domains DAWG file. The DAWG swap takes effect at the next histogram rotation (within a minute) since the collected histogram data is tied to the DAWG it was built against. A reload that fails to read a file logs an error and keeps the previous state. Changes to config keys that are not reloadable are logged with a warning saying a restart is required.

Updating a DAWG is safe while the service runs. dnstapir-edm copies each memory-mapped DAWG (well-known-domains-file, ignored-question-names-file) into a private dawg-staging directory under data-dir and memory-maps that copy, so overwriting the source file — in place or by atomic rename — cannot disturb the live mapping or crash the service. Send SIGHUP once the new file is completely written; a signal received mid-write makes that one reload fail and keep the previous DAWG, so re-send it after the write finishes.

Inspecting the resulting files

For inspecting the content you can use e.g. DuckDB like so:

  • For summarised histogram data
duckdb -c 'select * from "/tmp/dnstapir-edm/data/parquet/histograms/outbox/dns_histogram-2024-09-26T18-14-00Z_2024-09-26T18-15-00Z.parquet"'
  • For pseudonymised session (full message) data
duckdb -c 'select * from "/tmp/dnstapir-edm/data/parquet/sessions/dns_session_block-2024-09-26T18-18-00Z_2024-09-26T18-19-00Z.parquet"'

Next to the parquet directory you will also see a directory called "pebble". This is where dnstapir-edm keeps its key-value store which is used to tell if a query name has been seen before or not. The key-value store being used is pebble.

Observability

dnstapir-edm exposes prometheus metrics at 127.0.0.1:2112 and go pprof profiling data at 127.0.0.1:6060. To look at prometheus metrics:

curl 127.0.0.1:2112/metrics

There are multiple types of profiling data available, here is a CPU-centric example:

go tool pprof http://127.0.0.1:6060/debug/pprof/profile?seconds=30

Development

Formatting and linting

When working with this code at least the following tools are expected to be run at the top level directory prior to commiting:

Building

Binary

The most simple way of getting the dnstapir-edm binary is this:

make build

About

Edge DNSTAP Minimiser

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages