A relatively fast LZW compression algorithm in pure Lua.
Lossless compression for byte strings. The more repetition in the data, the better the ratio.
The library uses 16-bit dictionary codes (two bytes per code). The maximum dictionary size per level is 65280 codes.
Input is processed as a sequence of bytes (Lua string semantics). UTF-8 text round-trips correctly because multibyte sequences are compressed as individual bytes, not as Unicode code points.
While compressing, the algorithm checks whether the result would be strictly smaller than the input. If not, it returns an uncompressed passthrough instead (see Wire format).
local lualzw = require("lualzw")
local input = "foofoofoofoofoofoofoofoofoo"
local compressed = assert(lualzw.compress(input))
local decompressed = assert(lualzw.decompress(compressed))
assert(input == decompressed)Create a custom codec with configure(). The default module (require("lualzw")) uses skip = {} and control prefixes u / c.
local lualzw = require("lualzw")
-- Default: original encoding (may embed \0 in dictionary codes)
local legacy = lualzw.configure({ skip = {} })
-- Preferred when compressed data must not contain \0
local nullsafe = lualzw.configure({ skip = { [0] = true } })The default skip list {} matches the original on-the-wire encoding. Dictionary codes are 16-bit byte pairs and may include \0. That is fine for plain Lua strings, but problematic when compressed data passes through:
- C APIs or bindings that treat
\0as end-of-string - Null-terminated storage or logging
- Tools that truncate at the first null
Use { skip = { [0] = true } } when compressed output must be binary-safe. Both compressor and decompressor need the same skip setting; it is not stored in the payload.
Custom control prefixes (both peers must match):
local custom = lualzw.configure({
skip = { [0] = true },
uncompressed = "p",
compressed = "q",
})See configure(options) for full option details.
lualzw is a compression codec, not encryption or authentication. Treat compressed data from untrusted sources as hostile.
Always call decompress with explicit limits:
local MAX = 64 * 1024
local data, err = lualzw.decompress(payload, MAX, MAX * 2, MAX * 4)
if not data then
-- reject message
end| Limit | Parameter | Protects against |
|---|---|---|
| Output size | max_output_size |
Decompression bombs (huge expanded output) |
| Input size | max_input_size |
Large compressed blobs (memory / bandwidth) |
| Dictionary steps | max_codes |
CPU exhaustion during decode |
Bound compress on public endpoints as well:
local compressed, err = lualzw.compress(plaintext, MAX)Load the module:
local lualzw = require("lualzw")
print(lualzw._VERSION) -- "1.1.0"Each codec table (default, or from configure()) exports:
| Member | Description |
|---|---|
compress(input[, max_input_size]) |
Compress a string |
decompress(input[, max_output_size[, max_input_size[, max_codes]]]) |
Decompress or passthrough |
configure(options) |
Create a new codec; see configure(options) |
_VERSION |
Semantic version string |
uncompressed |
Passthrough prefix byte for this codec |
compressed |
Compressed prefix byte for this codec |
Returns a new codec table with the same methods as the default module, using the given options. Options are fixed for the lifetime of that codec; they are not stored in compressed output, so both peers must use matching settings.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
options |
table |
{} |
Configuration (all keys optional) |
options.skip |
table |
{} |
Byte values that must not appear in dictionary codes (see below) |
options.uncompressed |
string |
"u" |
One-byte prefix for passthrough output |
options.compressed |
string |
"c" |
One-byte prefix for LZW output |
Returns
A codec table with compress, decompress, configure, _VERSION, uncompressed, and compressed.
options.skip
Dictionary codes are 16-bit byte pairs. Skipped bytes never appear in those pairs (useful to keep compressed data free of \0, etc.).
Two table forms are accepted:
-- Map form
{ [0] = true, [1] = true }
-- List form (byte values as array entries)
{ 0, 1 }Each skipped byte slightly reduces available dictionary codes. If too many bytes are skipped, configure() raises:
invalid configuration, no character can be used in compression
options.uncompressed / options.compressed
- Each must be a string of exactly one byte.
- They must be different from each other.
- Invalid values raise
invalid uncompressed control character,invalid compressed control character, oruncompressed and compressed control characters must differ.
Changing skip or control settings produces incompatible compressed data. Default {} skip matches the original master encoding.
local codec = lualzw.configure({
skip = { [0] = true },
uncompressed = "u",
compressed = "c",
})
print(codec.uncompressed, codec.compressed) -- u cReturns: compressed string, or nil, error.
- Non-string input →
nil, "string expected, got <type>" - Input longer than
max_input_size→nil, "input exceeds limit" - Input of 0–1 bytes → passthrough (
uprefix; see wire format) - Longer input → LZW compress if strictly smaller than input, otherwise passthrough
- Internal failure →
nil, "algorithm error, could not fetch word"
Returns: original string, or nil, error.
Always pass limits when decoding untrusted data (see Untrusted input).
- Non-string input →
nil, "string expected, got <type>" - Invalid limit types →
nil, "number expected for <name>, got <type>" #inputor body larger thanmax_input_size→nil, "compressed input exceeds limit"- Decompressed size exceeds
max_output_size→nil, "decompressed output exceeds limit" - Dictionary growth exceeds
max_codes→nil, "decompression step limit exceeded" - Invalid or corrupt payload →
nil, "invalid input - not a compressed string"or"could not find last from dict. Invalid input?"
Each output starts with a one-byte control prefix configured on the codec (defaults shown):
| Prefix | Meaning | Body |
|---|---|---|
u (default) |
Uncompressed passthrough | Original bytes |
c (default) |
LZW compressed | Pairs of code bytes |
Examples with default controls:
Input to compress |
Output |
|---|---|
"" |
"u" |
"a" |
"ua" |
| Repetitive data | "c" .. <code pairs> if smaller than input |
| Incompressible data | "u" .. input |
lua spec/test.luaHistorical timings below were produced with benchmark/profiling.lua, which compares lualzw to LibCompress. LibCompress is not bundled with this repo.
From the repository root:
lua benchmark/profiling.luaUse --quick for smaller inputs (10 000 bytes, 3 iterations):
lua benchmark/profiling.lua --quickEach case runs the default and null-safe (skip = { [0] = true }) codecs. LibCompress is compared when installed.
Times are in seconds (average of 10 runs). Random inputs usually bail out to passthrough (100% of input size).
Input: 1 000 000 random bytes
| algorithm | compress | decompress | result % |
|---|---|---|---|
| lualzw | 0.6622 | 0.0003 | 100 |
| LibCompress | 2.1983 | 0.0024 | 100 |
Input: 1 000 000 random ASCII bytes
| algorithm | compress | decompress | result % |
|---|---|---|---|
| lualzw | 0.812 | 0.0022 | 100 |
| LibCompress | 1.782 | 0.0007 | 100 |
Input: 1 000 000 repeating cycling bytes
| algorithm | compress | decompress | result % |
|---|---|---|---|
| lualzw | 0.3975 | 0.0262 | 4.5001 |
| LibCompress | 0.3907 | 0.0264 | 6.6997 |
Input: 1 000 000 identical bytes
| algorithm | compress | decompress | result % |
|---|---|---|---|
| lualzw | 0.7045 | 0.0026 | 0.2829 |
| LibCompress | 0.6418 | 0.0038 | 0.4241 |
Input: "ymn32h8hm8ekrwjkrn9f" × 50 000 (1 000 000 bytes)
| algorithm | compress | decompress | result % |
|---|---|---|---|
| lualzw | 0.4788 | 0.0088 | 1.2629 |
| LibCompress | 0.4426 | 0.0093 | 1.8905 |
| Function | Condition | Error |
|---|---|---|
compress |
Wrong type | "string expected, got <type>" |
compress |
Input too large | "input exceeds limit" |
compress |
Internal | "algorithm error, could not fetch word" |
decompress |
Wrong type | "string expected, got <type>" |
decompress |
Bad limit type | "number expected for <name>, got <type>" |
decompress |
Empty / invalid | "invalid input - not a compressed string" |
decompress |
Corrupt codes | "could not find last from dict. Invalid input?" |
decompress |
Output limit | "decompressed output exceeds limit" |
decompress |
Input limit | "compressed input exceeds limit" |
decompress |
Step limit | "decompression step limit exceeded" |