Skip to content

kaxap/ja4zig

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ja4zig

Zig port of the FoxIO JA4+ network fingerprinting suite — passive client/server fingerprints for TLS, HTTP, SSH, X.509, latency, and TCP, computed from a pcap file via tshark. Mirrors the Rust crate at https://github.com/FoxIO-LLC/ja4. Targets Zig 0.16.0.

Surface State
ja4 CLI binary functional — reads a pcap, emits YAML (per-stream JA4+ records)
Library API (hash, tshark, pcap, stream, …) usable as a Zig dependency
Snapshot suite 37 fixtures, 23 byte-exact green; 14 differ only on EOF newlines (content-equivalent — see Snapshot caveat)
Unit tests 21/22 passing — the one failure is the snapshot harness reporting the 14 EOF-newline fixtures

The fingerprint pipeline (JA4 / JA4S / JA4H / JA4L / JA4X / JA4SSH) is implemented. Tshark ≥4.0.6 is required at runtime.

What works today

  • CLIja4 <pcap> shells out to tshark, processes every packet, emits a YAML record per stream matching the upstream Rust impl's assert_yaml_snapshot! output. Flags: -r/--with-raw (include unhashed fingerprints), -O/--original-order (disable cipher/extension/cookie sorting), -n/--with-packet-numbers (pkt_* fields).
  • Full JA4+ family:
    • JA4 (TLS client) — TLS version, SNI presence, cipher count, extension count, ALPN, GREASE filtering, hashed and sorted (or original-order) ciphers
      • extensions + signature algorithms.
    • JA4S (TLS server) — TLS version, selected cipher, extension list (never sorted), ALPN.
    • JA4H (HTTP client) — method, version, cookie/referer markers, header count + names, primary Accept-Language, cookie names + values. HTTP/1 and HTTP/2 paths.
    • JA4L (latency) — light-distance times derived from the TCP three-way handshake or QUIC Initial/Handshake exchanges; reports µs_TTL per side.
    • JA4SSH — packet-length mode sampling every N SSH packets, plus ssh_extras (hassh, hassh_server, protocol strings, negotiated encryption algorithm).
    • JA4T (TCP) — initial SYN window size, option kind sequence, MSS, window scale.
    • JA4X (X.509) — issuer/subject RDN OIDs + extension OIDs hashed, plus a pretty issuerCommonName: … / subjectOrganizationName: … enrichment. Built on a tiny custom ASN.1/DER walker (no dependencies).
  • Helpershash.hash12 (SHA-256 truncated to 12 hex chars), tshark.parseVersion (version-gate helper).
  • Vendored corpus — 38 pcaps + 37 reference YAML snapshots from upstream Rust insta, plus a bench suite as a template.

What's still WIP

  • ~/.config/ja4/config.toml loader (the Rust impl writes a default file on first run and lets you toggle individual fingerprints off). Defaults match upstream so this only matters if you'd want to disable a fingerprint type.
  • -j/--json output mode and -o/--output <path> (write to file).
  • --keylog-file <path> (TLS pre-master secrets for decrypting captures).
  • The "JA4Plus mapping" lookup CSV (not part of fingerprint generation).

Snapshot caveat

23/37 snapshots match byte-for-byte. The 14 remaining differ from the upstream YAML reference by one byte at EOF: some upstream snapshots end with \n and some with \n\n, depending on the serde-yaml version that was running when each one was generated by cargo insta review. There's no content-derivable rule (same field types and shapes end different ways), and we emit a single \n because that matches the majority of fixtures.

Verify content equivalence yourself:

diff -q \
  <(./zig-out/bin/ja4 tests/testdata/pcap/tls12.pcap | sed -e 's/[[:space:]]*$//' | awk 'NF') \
  <(sed -e 's/[[:space:]]*$//' tests/testdata/snapshots/tls12.pcap.yaml | awk 'NF')

…returns "match" for every fixture in the corpus.

Using ja4zig as a dependency

Until ja4zig ships a tagged release, depend on it by relative path. From your project's build.zig.zon:

.dependencies = .{
    .ja4zig = .{
        .path = "../ja4zig",
    },
},

In your build.zig:

const ja4zig = b.dependency("ja4zig", .{
    .target = target,
});
exe.root_module.addImport("ja4zig", ja4zig.module("ja4zig"));

Quick hash12 usage:

const ja4zig = @import("ja4zig");

pub fn main() void {
    var digest: [12]u8 = undefined;
    ja4zig.hash.hash12("t13d1715h2_5b234860e130_014157ec0da2", &digest);
    std.debug.print("ja4 = {s}\n", .{&digest});
}

End-to-end (process a pcap through the same pipeline the CLI uses):

const ja4zig = @import("ja4zig");

pub fn main(init: std.process.Init) !void {
    var reader = try ja4zig.pcap.Reader.open(init.arena.allocator(), init.io, "trace.pcap", null);
    defer reader.close();

    var streams = ja4zig.stream.Streams.init(init.arena.allocator(), .{});
    defer streams.deinit();

    while (try reader.next()) |pkt| try streams.update(pkt);
    try streams.finish();

    var buf: [16 * 1024]u8 = undefined;
    var fw: std.Io.File.Writer = .init(.stdout(), init.io, &buf);
    try streams.emitYaml(&fw.interface, .{});
    try fw.interface.flush();
}

Public API

Library helpers

  • hash.hash12(s: []const u8, out: *[12]u8) void — first 12 hex chars of SHA-256(s); empty input → "000000000000". No allocations.
  • tshark.parseVersion(output: []const u8) ?[]const u8 — extracts the version string from tshark --version's first line. Returns null if the marker isn't found or no whitespace terminator follows.

Pcap iterator

  • pcap.Reader.open(gpa, io, pcap_path, ?keylog_path) !Reader — spawns tshark with -T ek, captures the JSON output, and yields one packet per next() call. Per-packet memory is held in an arena that resets between calls — copy out what you want to keep.
  • Packet.findProto(name) ?Proto / lastProto(name) ?Proto — innermost first/last layer with the given name; transparently descends into quic.tls for QUIC-encapsulated TLS handshakes.
  • Proto.first(field) ?[]const u8 / values(field) ValueIter — field accessors using the rtshark naming convention (tls.handshake.extension.type), translated on the fly to the tshark -T ek double-prefixed form (tls_tls_handshake_extension_type).

Stream tracker

  • stream.Streams.init(gpa, .{ .tcp = true, .tls = true, … }) — central state. Per-fingerprint Conf toggles default on.
  • Streams.update(pkt) !void — dispatches packet to every enabled fingerprint module. Per-module errors are swallowed (logged).
  • Streams.finish() !void — flushes in-flight JA4SSH samples at EOF.
  • Streams.emitYaml(writer, flags) !void — emits the same YAML record set the CLI prints. flags.with_raw, flags.original_order, flags.with_packet_numbers map directly to the CLI flags.

Design notes

tshark driver (src/pcap.zig)

We spawn tshark -r <pcap> -T ek (line-delimited JSON), capture stdout in one buffer via std.process.run, and iterate line-pairs ({"index":…} is skipped; the following data line is parsed with std.json.parseFromSliceLeaky into an arena reset between packets). Field-name normalization translates the rtshark-style identifier tls.handshake.extension.type to the ek-style key tls_tls_handshake_extension_type on the fly so call sites stay readable.

For TLS inside QUIC the TLS dissection lives under quic.tls as either an object (single CRYPTO frame) or an array (multiple frames in one packet); both quic itself and quic.tls can be arrays. findProto("tls") transparently walks every nesting permutation and prefers an element with tls.handshake.type so the right ClientHello/ServerHello is dispatched.

hash12 is now uncached (src/hash.zig)

Previous iterations memoized digests via a 16-slot direct-mapped cache keyed by (len, first 8 bytes, last 8 bytes) — 160 bits. That tripped a false-positive collision on tls3.pcapng: two distinct JA4 extension lists shared length, first bytes, and last bytes but had different middles, and the cache returned the wrong digest. Cache removed; SHA-256 (which on Apple Silicon runs on the sha256h.4s intrinsics at ~3.2 GB/s) is computed every call.

SIMD hex encoding

Six raw bytes → twelve ASCII hex chars in ~5 SIMD ops on AArch64. Splits hi/lo nibbles via @Vector(6, u8) shift/mask, interleaves to a 12-lane @Vector(12, u8) with @shuffle, then ASCII-encodes branchlessly via n + '0' + ((n + 6) >> 4) * 0x27.

Tiny ASN.1/DER walker (src/ja4x.zig)

JA4X needs to extract issuer/subject RDN OIDs and extension OIDs from each X.509 certificate. Rather than pull in a DER dependency, we walk the cert structure directly: outer SEQUENCE → tbsCertificate SEQUENCE → optional [0] EXPLICIT version → serial → algorithm → issuer Name → validity → subject Name → SubjectPublicKey → optional [3] EXPLICIT extensions. Each Name is a SEQUENCE OF RDN, each RDN a SET OF AttributeTypeAndValue. OIDs are emitted as lowercase hex of the DER OID bytes (no dotted form); recognized OIDs also get a pretty issuerCountryName: US enrichment via a small lookup table that mirrors x509-parser's with_x509() default registry — including the Microsoft jurisdiction OIDs and intentionally omitting emailAddress because the Rust impl drops it.

Parallel snapshot harness (tests/snapshot_test.zig)

Workers up to min(std.Thread.getCpuCount(), 8) pull pcap indices off an std.atomic.Value(usize) counter; each holds its own Io.Dir handle on the snapshots directory. Override the count with SNAPSHOT_WORKERS=N. With the real CLI in place (vs the µs-fast stub of phase 1), running tests now spawns ~37 tshark processes — parallel knocks zig build test from several seconds down to under a second.

Build, test, bench

zig build                # produces zig-out/bin/ja4 and zig-out/bin/ja4zig-bench
zig build test           # unit tests + snapshot harness (37 fixtures)
zig build run -- <pcap>  # run the CLI against a pcap
zig build bench          # full bench suite (micro + per-pcap)

Bench commands:

zig build bench                                # full suite (defaults)
zig build bench -- micro --batches=50          # microbenchmarks only
zig build bench -- pcap --runs=5 --warmup=2    # per-pcap only
zig build bench -- all --json > out.json       # machine-readable output
zig build -Dbench-optimize=Debug bench         # build the bench binary in Debug

Two layers:

  • Microbenchmarks — auto-calibrated iteration counts with asm volatile memory barriers to defeat constant folding. Covers hash12 across four input sizes (empty / 20 B / 256 B / 4 KiB) and parseVersion (happy / miss). Reports min / median / mean / p95 / stddev / CV / ops·s / MB·s. <1ns means the body amortized below the host clock's resolution.
  • Per-pcap end-to-end — runs every detected JA4 implementation against every fixture: ja4zig (this repo), rust/ja4 release binary (if built), python/ja4.py, and raw tshark -T ek as a lower bound. Missing impls are silently skipped.

Current micro numbers on Apple Silicon (ReleaseFast):

Bench Time Throughput
hash12/short_20B 16 ns 1.2 GB/s
hash12/realistic_256B 68 ns 3.0 GB/s
hash12/4KiB 1.21 µs 3.2 GB/s
parseVersion/happy 2 ns 23.8 GB/s
parseVersion/miss 1 ns

Layout

ja4zig/
├── build.zig
├── build.zig.zon
├── config.toml                (copied from rust/ja4/config.toml)
├── src/
│   ├── main.zig               (CLI entry — argv, run loop, YAML emit)
│   ├── root.zig               (library root — re-exports the modules below)
│   ├── pcap.zig               (tshark subprocess + Packet/Proto)
│   ├── stream.zig             (Streams map + per-stream dispatch + output)
│   ├── tcp.zig                (JA4T)
│   ├── tls.zig                (JA4 + JA4S)
│   ├── http.zig               (JA4H)
│   ├── ssh.zig                (JA4SSH + ssh_extras)
│   ├── time.zig               (JA4L-C / JA4L-S, TCP + QUIC)
│   ├── ja4x.zig               (X.509 ASN.1/DER walker + JA4X)
│   ├── grease.zig             (RFC 8701 GREASE detection)
│   ├── yaml.zig               (small YAML scalar emitter w/ quoting)
│   ├── hash.zig               (hash12 + SIMD hex encoder)
│   └── tshark.zig             (parseVersion)
├── bench/
│   ├── main.zig               (bench CLI dispatcher)
│   ├── micro.zig              (microbenchmarks with calibration)
│   ├── pcap_bench.zig         (per-pcap end-to-end across detected impls)
│   ├── output.zig             (table + JSON formatters)
│   ├── stats.zig              (min/median/mean/p95/stddev/CV)
│   └── timer.zig              (std.Io.Clock wrapper)
└── tests/
    ├── snapshot_test.zig      (parallel pcap-vs-YAML harness)
    ├── import-snapshots.sh    (refresh YAML fixtures from upstream insta)
    └── testdata/
        ├── pcap/<name>.pcap[ng]   (38 vendored — ~5.6 MB total)
        └── snapshots/<name>.yaml  (37 vendored reference snapshots)

Regenerating snapshot fixtures

The pcap and YAML fixtures under tests/testdata/ are vendored — the repo needs no external checkout to build or test. To refresh the YAML snapshots against a newer upstream:

JA4_UPSTREAM=/path/to/FoxIO-LLC/ja4 ./tests/import-snapshots.sh

(JA4_UPSTREAM defaults to ../../ja4 relative to this repo.) The gtp-iphone.pcap snapshot is skipped because its pcap isn't bundled upstream.

Roadmap

Largely cleanup work left:

  • TOML config loader (~/.config/ja4/config.toml) so individual fingerprint types can be disabled the way the Rust CLI allows.
  • -j/--json output mode + -o/--output <path>.
  • --keylog-file <path> (pass to tshark, allowing decryption of captures whose pre-master secrets were logged).
  • Investigate the EOF-newline serde-yaml inconsistency in upstream snapshots; if a content rule emerges, restore byte-exact 37/37 match.

License & attribution

Independent Zig re-implementation of the upstream FoxIO JA4+ algorithms. Not affiliated with FoxIO, LLC.

  • JA4 (TLS client fingerprinting) is BSD 3-Clause — see upstream LICENSE-JA4. Free for commercial use.
  • JA4+ extensions — JA4S, JA4H, JA4L, JA4X, JA4SSH (and the others) — are licensed under FoxIO License 1.1, which is non-commercial only. Any commercial use of this port's JA4+ output must comply with that license.

See upstream's License FAQ for the FoxIO License 1.1's intent and edge cases.

About

Zig port of JA4

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors