zgz is a minimal Zig v0.16.0 gzip decompression library backed by
zlib-ng. It is intended as a
highly optimized decompressor for high-throughput genome sequencing
data in production environments and bioinformatics applications.
Performance is on par with flate2 implementations in Rust
(zlib-ng, zlib-rs or miniz_oxide) and exceeds gzip/zcat in
the tested benchmark cases.
- Streaming gzip decompression using native
zlib-ngbindings viazig-zlib-ng - High-performance, minimal streaming decompression executable for Linux/MacOS
- Concatenated gzip member support and decompressed output limit
- Zig library and APIs for custom implementations
- No other dependencies - standard library only
The zgz executable can be used as a general zcat-like decompressor:
zgz reads.fq.gz > reads.fqZig library for custom implementations:
- High-level
std.Io.Readertostd.Io.Writerstreaming API (zgz.decompress) - Direct buffer-filler API that inflates into caller-owned buffers (
zgz.GzipInput) - Low-level stateful decompressor API for custom drivers (
zgz.Decompressor)
zgz consumes the zng static library artifact from zig-zlib-ng.
zgz binds the native zlib-ng API:
zlibng_version
zng_inflateInit2
zng_inflate
zng_inflateReset2
zng_inflateEnd
It does not bind the classic zlib-compatible symbols such as inflate, inflateInit2_, or zlibVersion.
Build:
zig build -Doptimize=ReleaseFastRun tests:
zig build testzgz streams gzip input to stdout through the high-level zgz.decompress
reader-to-writer API:
./zig-out/bin/zgz [options] FILE.gz--in-buffer BYTES Input file reader buffer. Supports K/M/G suffixes.
Default: 256K.
--out-buffer BYTES Output/decompression buffer. Supports K/M/G suffixes.
Default: 256K.
--max-output BYTES Abort if output exceeds this limit. Supports K/M/G suffixes.
Must be greater than zero in the CLI.
--no-concat Reject concatenated gzip members and trailing data.
-h, --help Show help.
zgzfill is mainly useful for benchmarking the direct API that downstream
parsers can use to inflate directly into their own buffers.
Examples:
# Default buffer size: 256K
./zig-out/bin/zgz testdata/biofast.fq.gz > /dev/null
# Larger buffer size may increase performance
./zig-out/bin/zgz --in-buffer 1M --out-buffer 1M sample.gz > /dev/nullDependencies used for benchmark:
zig=0.16.0rust=1.93.2python=3.14.3gzip=1.10.0hyperfine=1.20.0
Create a valid and invalid gzip compressed file corpus:
tools/01-make-corpus.sh ./testdataValidate zgz valid/invalid test corpus against zcat:
tools/02-check-corpus.sh testdata/corpusTest output equivalence of benchmark executables (C, Rust, Zig) against gzip/zcat:
tools/03-test-equivalence.sh testdata/corpus/valid/*.gzRun hyperfine benchmarks of decompresson library executables (C, Rust, Zig) against
the biofast reference .fastq (Illumina short-reads, 150 bp) compressed with gzip
and the Zymo nanopore long read mock community (ONT long-reads, ~ 5kbp average read
length):
tools/04-benchmark-files.sh testdata/biofast/biofast-v1.fastq.gz
tools/04-benchmark-files.sh testdata/zymo/zymo-v1.fastq.gzRun hyperfine benchmarks and create a benchmark table in docs/benchmarks.md:
tools/05-benchmark-markdown.sh testdata/biofast/biofast-v1.fastq.gzZig v0.16.0
Use zgz.decompress to stream from a std.Io.Reader to a std.Io.Writer:
const std = @import("std");
const zgz = @import("zgz");
pub fn main(init: std.process.Init) !void {
const io = init.io;
var input_file = try std.Io.Dir.cwd().openFile(io, "sample.gz", .{});
defer input_file.close(io);
var input_buffer: [256 * 1024]u8 = undefined; // on stack
var input_reader = input_file.readerStreaming(io, &input_buffer);
var stdout_file = std.Io.File.stdout();
var output_buffer: [256 * 1024]u8 = undefined; // on stack
var output_writer = stdout_file.writer(io, &output_buffer);
_ = try zgz.decompress(
&input_reader.interface,
&output_writer.interface,
.{
.allow_concatenated_members = true,
.max_output_bytes = null,
},
);
try output_writer.interface.flush();
}zgz.decompress borrows input directly from the reader and writes decompressed
bytes directly into the writer buffer.
Use GzipInput when the caller already owns an optimized output buffer, such as
a parser refill buffer:
file -> std.Io.File.Reader buffer -> zlib-ng -> caller output buffer
Example:
const std = @import("std");
const zgz = @import("zgz");
pub fn main(init: std.process.Init) !void {
const io = init.io;
var input_file = try std.Io.Dir.cwd().openFile(io, "reads.fastq.gz", .{});
defer input_file.close(io);
var input_buffer: [256 * 1024]u8 = undefined; // on stack
var input_reader = input_file.readerStreaming(io, &input_buffer);
var gzip: zgz.GzipInput = undefined;
try gzip.init(
&input_reader.interface,
.{
.allow_concatenated_members = true,
.max_output_bytes = null,
},
);
defer gzip.deinit();
var parser_buffer: [800 * 1024]u8 = undefined; // on stack
while (true) {
const result = try gzip.readInto(parser_buffer[0..]);
// Process parser_buffer[0..result.written]
if (result.end) break;
if (result.written == 0) {
return error.DriverMadeNoProgress;
}
}
}For parser integration, pass the parser's free buffer region:
const free = parser.buffer[parser.end..parser.capacity];
const result = try gzip.readInto(free);
parser.end += result.written;
parser.eof = result.end;GzipInput.readInto does not allocate and does not use an intermediate decompressed buffer.
Important
Initialize GzipInput in its final memory location and do not move it after successful initialization. It
contains a Decompressor, and zlib-ng stores an internal back-pointer to the decompressor stream address.
Correct:
var gzip: zgz.GzipInput = undefined;
try gzip.init(&input_reader.interface, .{});
defer gzip.deinit();Incorrect:
var tmp: zgz.GzipInput = undefined;
try tmp.init(&input_reader.interface, .{});
var gzip = tmp;If GzipInput lives inside another struct, initialize the struct storage first, then call init on the field:
var wrapper: Wrapper = undefined;
try wrapper.gzip.init(&input_reader.interface, .{});
defer wrapper.gzip.deinit();The high-level streaming API uses:
pub const StreamOptions = struct {
allow_concatenated_members: bool = true,
max_output_bytes: ?usize = null,
};The direct API uses the equivalent:
pub const GzipInputOptions = struct {
allow_concatenated_members: bool = true,
max_output_bytes: ?usize = null,
};Enabled by default. This matches gzip -dc and zcat, which decode
concatenated gzip members as one logical stream.
Set it to false to reject trailing members or trailing data:
_ = try zgz.decompress(reader, writer, .{
.allow_concatenated_members = false,
});Limits total decompressed bytes, not compressed input bytes or output buffer size.
_ = try zgz.decompress(reader, writer, .{
.max_output_bytes = 100 * 1024 * 1024,
});Semantics:
null -> allow all byte - no decompressed-output limit configured
0 -> allow only streams that produce zero decompressed bytes
N -> allow at most N decompressed bytes
The library treats 0 literally as “allow zero output bytes.” The CLI rejects
--max-output 0 because it is not useful for a cat-like decompression command.
For custom drivers, use Decompressor directly:
var d: zgz.Decompressor = .{};
try d.initGzip();
defer d.deinit();
var in_pos: usize = 0;
var out_pos: usize = 0;
while (true) {
const step = try d.decompress(
compressed[in_pos..],
output[out_pos..],
);
in_pos += step.read;
out_pos += step.written;
switch (step.status) {
.progress, .need_input_or_output => {
if (step.read == 0 and step.written == 0) {
return error.DriverMadeNoProgress;
}
},
.end => break,
}
}Decompressor.decompress does not itself return error.NoProgress; it reports
what zlib-ng did. The caller-owned driver loop decides whether a no-progress
step is recoverable or fatal.
Important
Initialize Decompressor in place and do not move it after successful initialization. zlib-ng stores an internal
back-pointer to the stream address. Do not use a constructor that returns an initialized Decompressor by value.
Correct:
var d: zgz.Decompressor = .{};
try d.initGzip();
defer d.deinit();Incorrect:
var tmp: zgz.Decompressor = .{};
try tmp.initGzip();
var d = tmp;
defer d.deinit();If Decompressor lives inside another struct, initialize the struct storage first, then call initGzip on the field:
var wrapper: Wrapper = undefined;
try wrapper.decompressor.initGzip();
defer wrapper.decompressor.deinit();/// Errors that can occur while driving the zlib-ng inflate state machine.
pub const InflateError = error{
/// The compressed stream is malformed or not valid for the configured
/// wrapper format.
InvalidData,
/// zlib-ng reported an invalid stream state. This usually means the
/// decompressor was not initialized, was moved after initialization,
/// or the native ABI binding is wrong.
InvalidState,
/// zlib-ng could not allocate internal inflate state.
OutOfMemory,
/// The input or output slice exceeded zlib-ng's `uint32_t` availability
/// counter limit.
SliceTooLarge,
/// `decompress` was called after the current member reached end-of-stream.
///
/// Call `reset` before feeding another gzip member, or create a new
/// decompressor. This guard catches accidental post-end reuse instead of
/// forwarding an invalid state transition into zlib-ng.
StreamEnded,
/// zlib-ng returned a code this binding does not recognize.
UnknownZlibError,
};
/// High-level streaming errors.
pub const StreamError = InflateError || error {
/// The input ended before zlib-ng reached the gzip stream end marker.
UnexpectedEnd,
/// Non-gzip trailing data was present after the first gzip member while
/// concatenated-member decoding was disabled.
TrailingData,
/// The decompressed byte limit was reached before the stream ended.
OutputLimitExceeded,
/// The supplied `std.Io.Reader` has no usable buffer.
ReaderBufferTooSmall,
/// The supplied `std.Io.Writer` has no usable buffer.
WriterBufferTooSmall,
/// The streaming driver observed a successful inflate step that consumed no
/// input, produced no output, and did not finish the current member.
///
/// Without this guard the driver could spin forever. This usually indicates
/// a bug in the driver loop, an invalid stream transition, or an unexpected
/// zlib-ng state-machine result.
NoProgress,
/// The underlying `std.Io.Reader` failed.
ReadFailed,
/// The underlying `std.Io.Writer` failed.
WriteFailed,
};Run unit tests:
zig build test --summary allMIT and ZLIB-NG