English | 日本語
Fast, safe, multi-byte aware slug generation for Zig.
- Fast: single-pass UTF-8 decode, transliterate, normalize, and dash-collapse — no intermediate allocations
- Safe: forgiving UTF-8 decoder, no pointer arithmetic, tables are immutable constants
- Multi-byte aware: transliterates any BMP codepoint to ASCII via an embedded ~430KB unidecode table
- 20 languages: language-specific substitutions for
bg,cs,de,en,es,fi,fr,gr,hu,id,it,kk,nb,nl,nn,pl,pt,ro,sl,sv,tr - Optional Japanese: opt-in dictionary-based slugification with Hepburn romanization (
世界 → sekai,プログラミング → puroguramingu) - Zero runtime init: tables are embedded via
@embedFileinto.rodata - Flexible API: both a stack-buffer variant (
slugify) and an allocating variant (slugifyAlloc)
Inspired by gosimple/slug and gosimple/unidecode.
zlug ships two modules from the same codebase:
| Module | Embedded data | Japanese |
|---|---|---|
zlug |
~430 KB | ❌ falls back to unidecode (世界 → shi-jie) |
zlug_ja |
~7.7 MB | ✅ Sudachi-based dictionary + Hepburn romanization (世界 → sekai) |
Same public API — pick the one that matches your size budget.
Requires Zig 0.15.2 or later.
Add zlug to your project:
zig fetch --save git+https://github.com/linyows/zlug#v0.1.0This updates your build.zig.zon:
.dependencies = .{
.zlug = .{
.url = "git+https://github.com/linyows/zlug#v0.1.0",
.hash = "...",
},
},Then in build.zig:
const zlug_dep = b.dependency("zlug", .{
.target = target,
.optimize = optimize,
});
// Lean variant (no Japanese dictionary):
exe.root_module.addImport("zlug", zlug_dep.module("zlug"));
// Or full variant with Japanese dictionary (~7 MB):
// exe.root_module.addImport("zlug", zlug_dep.module("zlug_ja"));const std = @import("std");
const zlug = @import("zlug");
pub fn main() !void {
var gpa: std.heap.GeneralPurposeAllocator(.{}) = .{};
defer _ = gpa.deinit();
const alloc = gpa.allocator();
// Allocating API
const slug = try zlug.slugifyAlloc(alloc, "Hello, 世界!", .{});
defer alloc.free(slug);
std.debug.print("{s}\n", .{slug}); // => "hello-shi-jie"
// Stack-buffer API (no allocator)
var buf: [256]u8 = undefined;
const s = try zlug.slugify(&buf, "Héllo Wörld", .{ .lang = .de });
std.debug.print("{s}\n", .{s}); // => "hello-woerld"
}pub const Options = struct {
lang: Lang = .en,
lowercase: bool = true,
max_length: usize = 0, // 0 disables truncation
smart_truncate: bool = true, // cut at last '-' within max_length
keep_multiple_dashes: bool = false,
keep_edge_dashes: bool = false,
};| Input | Lang | Output |
|---|---|---|
"Hello, world!" |
en |
hello-world |
"café au lait" |
en |
cafe-au-lait |
"rock & roll" |
en |
rock-and-roll |
"über große Größen" |
de |
ueber-grosse-groessen |
"Здравей Свят" |
bg |
zdravey-svyat |
"世界" |
en |
shi-jie |
"it’s mine" |
en |
its-mine |
"a—b" (em dash) |
en |
a-b |
slugify(buf: []u8, input: []const u8, opts: Options) ![]u8— writes into caller's bufferslugifyAlloc(alloc: Allocator, input: []const u8, opts: Options) ![]u8— caller owns returned sliceisSlug(text: []const u8) bool— validate an existing slugparseLang(tag: []const u8) Lang— parse a BCP-47-ish language tag
zlug performs slug generation in a single pass over the input:
- Decode UTF-8 to a codepoint (forgiving — invalid sequences become
U+FFFD) - Apply per-language substitution (e.g.
ä → aein German) - Apply shared default substitutions (smart quotes, en/em dashes)
- Look up unidecode transliteration for non-ASCII BMP codepoints
- Per ASCII byte: lowercase, authorized-char check, consecutive-dash collapse
- Write directly into the output buffer
The unidecode table is stored as two embedded binary blobs:
src/bmp_index.bin(262KB) —[0x10001]u32cumulative byte offsetssrc/bmp_data.bin(169KB) — concatenated ASCII transliterations
Lookup is data[index[cp]..index[cp+1]], two std.mem.readInt calls and a slice. The tables are embedded via @embedFile and live in .rodata — there is zero runtime initialization.
Regenerate the tables from gosimple/unidecode's table.txt with:
zig run tools/gen_table.zig -- /path/to/table.txt src/# Run tests
zig build test
# Build the static library
zig build --release=fast
# Check formatting
zig fmt --check src/ tools/Versions are managed by git tags. The build.zig.zon .version field stays at 0.0.0-dev in the tree and is rewritten by the release workflow to match the tag.
git tag v0.1.0
git push origin v0.1.0The workflow at .github/workflows/release.yml will run tests, build in release mode, and create a GitHub Release with a source tarball and SHA-256 checksum.
MIT — see LICENSE.
Unidecode table data is derived from gosimple/unidecode, licensed under the Apache License 2.0.