zlug

Fast, safe, multi-byte aware slug generation for Zig.

Features

Fast: single-pass UTF-8 decode, transliterate, normalize, and dash-collapse — no intermediate allocations
Safe: forgiving UTF-8 decoder, no pointer arithmetic, tables are immutable constants
Multi-byte aware: transliterates any BMP codepoint to ASCII via an embedded ~430KB unidecode table
20 languages: language-specific substitutions for bg, cs, de, en, es, fi, fr, gr, hu, id, it, kk, nb, nl, nn, pl, pt, ro, sl, sv, tr
Optional Japanese: opt-in dictionary-based slugification with Hepburn romanization (世界 → sekai, プログラミング → puroguramingu)
Zero runtime init: tables are embedded via @embedFile into .rodata
Flexible API: both a stack-buffer variant (slugify) and an allocating variant (slugifyAlloc)

Inspired by gosimple/slug and gosimple/unidecode.

Two flavors

zlug ships two modules from the same codebase:

Module	Embedded data	Japanese
`zlug`	~430 KB	❌ falls back to unidecode (`世界 → shi-jie`)
`zlug_ja`	~7.7 MB	✅ Sudachi-based dictionary + Hepburn romanization (`世界 → sekai`)

Same public API — pick the one that matches your size budget.

Installation

Requires Zig 0.15.2 or later.

Add zlug to your project:

zig fetch --save git+https://github.com/linyows/zlug#v0.1.0

This updates your build.zig.zon:

.dependencies = .{
    .zlug = .{
        .url = "git+https://github.com/linyows/zlug#v0.1.0",
        .hash = "...",
    },
},

Then in build.zig:

const zlug_dep = b.dependency("zlug", .{
    .target = target,
    .optimize = optimize,
});

// Lean variant (no Japanese dictionary):
exe.root_module.addImport("zlug", zlug_dep.module("zlug"));

// Or full variant with Japanese dictionary (~7 MB):
// exe.root_module.addImport("zlug", zlug_dep.module("zlug_ja"));

Usage

const std = @import("std");
const zlug = @import("zlug");

pub fn main() !void {
    var gpa: std.heap.GeneralPurposeAllocator(.{}) = .{};
    defer _ = gpa.deinit();
    const alloc = gpa.allocator();

    // Allocating API
    const slug = try zlug.slugifyAlloc(alloc, "Hello, 世界!", .{});
    defer alloc.free(slug);
    std.debug.print("{s}\n", .{slug}); // => "hello-shi-jie"

    // Stack-buffer API (no allocator)
    var buf: [256]u8 = undefined;
    const s = try zlug.slugify(&buf, "Héllo Wörld", .{ .lang = .de });
    std.debug.print("{s}\n", .{s}); // => "hello-woerld"
}

Options

pub const Options = struct {
    lang: Lang = .en,
    lowercase: bool = true,
    max_length: usize = 0,       // 0 disables truncation
    smart_truncate: bool = true, // cut at last '-' within max_length
    keep_multiple_dashes: bool = false,
    keep_edge_dashes: bool = false,
};

Examples

Input	Lang	Output
`"Hello, world!"`	`en`	`hello-world`
`"café au lait"`	`en`	`cafe-au-lait`
`"rock & roll"`	`en`	`rock-and-roll`
`"über große Größen"`	`de`	`ueber-grosse-groessen`
`"Здравей Свят"`	`bg`	`zdravey-svyat`
`"世界"`	`en`	`shi-jie`
`"it’s mine"`	`en`	`its-mine`
`"a—b"` (em dash)	`en`	`a-b`

API

slugify(buf: []u8, input: []const u8, opts: Options) ![]u8 — writes into caller's buffer
slugifyAlloc(alloc: Allocator, input: []const u8, opts: Options) ![]u8 — caller owns returned slice
isSlug(text: []const u8) bool — validate an existing slug
parseLang(tag: []const u8) Lang — parse a BCP-47-ish language tag

How it works

zlug performs slug generation in a single pass over the input:

Decode UTF-8 to a codepoint (forgiving — invalid sequences become U+FFFD)
Apply per-language substitution (e.g. ä → ae in German)
Apply shared default substitutions (smart quotes, en/em dashes)
Look up unidecode transliteration for non-ASCII BMP codepoints
Per ASCII byte: lowercase, authorized-char check, consecutive-dash collapse
Write directly into the output buffer

The unidecode table is stored as two embedded binary blobs:

src/bmp_index.bin (262KB) — [0x10001]u32 cumulative byte offsets
src/bmp_data.bin (169KB) — concatenated ASCII transliterations

Lookup is data[index[cp]..index[cp+1]], two std.mem.readInt calls and a slice. The tables are embedded via @embedFile and live in .rodata — there is zero runtime initialization.

Regenerate the tables from gosimple/unidecode's table.txt with:

zig run tools/gen_table.zig -- /path/to/table.txt src/

Development

# Run tests
zig build test

# Build the static library
zig build --release=fast

# Check formatting
zig fmt --check src/ tools/

Releasing

Versions are managed by git tags. The build.zig.zon .version field stays at 0.0.0-dev in the tree and is rewritten by the release workflow to match the tag.

git tag v0.1.0
git push origin v0.1.0

The workflow at .github/workflows/release.yml will run tests, build in release mode, and create a GitHub Release with a source tarball and SHA-256 checksum.

License

MIT — see LICENSE.

Unidecode table data is derived from gosimple/unidecode, licensed under the Apache License 2.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

zlug

Features

Two flavors

Installation

Usage

Options

Examples

API

How it works

Development

Releasing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
src		src
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.ja.md		README.ja.md
README.md		README.md
build.zig		build.zig
build.zig.zon		build.zig.zon

Folders and files

Latest commit

History

Repository files navigation

zlug

Features

Two flavors

Installation

Usage

Options

Examples

API

How it works

Development

Releasing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages