A port of the 3DO SDK's compression library to modern platforms with a command line tool.
$ 3ct --help-all
3ct: 3DO Compression Tool (v1.0.0)
Usage: 3ct [OPTIONS] SUBCOMMAND
Options:
-h,--help Print this help message and exit
--help-all List help for all subcommands
Subcommands:
compress
Compress input file
Positionals:
input-filepath PATH:FILE REQUIRED
Path to input file
output-filepath PATH:FILE Path to output file (default: input + '.compressed')
decompress
Decompress input file
Positionals:
input-filepath PATH:FILE REQUIRED
Path to input file
output-filepath PATH:FILE Path to output file (default: input + '.decompressed')
ggc-compress
Compress input file as Game Guru .COMP
Positionals:
input-filepath PATH:FILE REQUIRED
Path to input file
output-filepath PATH:FILE Path to output file (default: input + '.COMP')
Options:
--file-type TEXT 4-byte 3DO filesystem file type for byte-exact headers
ggc-decompress
Decompress Game Guru .COMP input file
Positionals:
input-filepath PATH:FILE REQUIRED
Path to input file
output-filepath PATH:FILE Path to output file (default: input + '.decompressed')
check
Checks the compressor and decompressor against data generated by the 3DO SDK compression library
$ 3ct compress example.txt
- input:
- filepath: example.txt
- size_in_bytes: 1024
- size_in_words: 256
- output:
- filepath: example.txt.compressed
- size_in_bytes: 144
- size_in_words: 36
$ 3ct check
* output of 3ct compressor matches SDK
* output of 3ct decompressor matches SDK
* output of 3ct GGC compressor matches Game Guru
* output of 3ct GGC decompressor matches Game Guru
- Rework and modernize compression code
The 3DO SDK compression algorithm is based on LZSS (Lempel–Ziv–Storer–Szymanski). An LZ77-style dictionary-based algorithm optimized for the 3DO platform which replaces repeated byte sequences with back-references into a sliding window, whilee emitting novel bytes as literals.
- type: Dictionary-based lossless compression (LZ77-style)
- data structure: Binary search tree for fast string matching
- sliding window: 4096 bytes (2^12)
- 12-bit index for window position references
- Window maintains previously seen data for pattern matching
- look-ahead buffer: 18 bytes (2^4 + 2)
- 4-bit length field encoding phase lengths of 2-18 bytes
- Allows efficient encoding of repeated sequences
- break-even point: 2 bytes (minimum phrase length is 3 bytes)
- Phrases shorter than 3 bytes are stored as literal bytes instead
Each token in the compressed stream is prefixed with a header bit:
- header bit = 1: Literal byte (1 bit + 8 data bits, 9 total.)
Emitted when no match longer than 2 bytes (
BREAK_EVEN) is found. - header bit = 0: Phrase reference (1 bit + 12-bit index + 4-bit
length, 17 total.) The length field stores
match_length >= 3, encoding match lengths from 3 to 18 bytes. - end of stream: header bit = 0 followed by position = 0
(
END_OF_STREAM), signaling the decompressor that no more data follows.
The phrase reference encodes:
- position: 12-bit index into the sliding window (0 - 4095)
- length: 4-bit value representing match length minus 3 (0 - 15, corresponding to 3 - 18 bytes)
The compressor maintains a binary search tree (BST) over all substrings in the sliding window for efficient longest-match lookup. Each node stores parent, left-child, and right-child indices. When a maxium-lenght match (18 bytes) is found, the new node replaces the matched node in the tree. Strings are deleted from the tree as they slide out of the window.
- Input and output are word-oriented (32-bit words)
- The bitstream is packed into 32-bit words in big-endian byte order, matching the 3DO's big-endian ARM60 processor. On little-endian hosts the words are byte-swapped.
Game Guru .COMP files use a different wrapper and bitstream than the
3DO SDK comp3do format. The implementation was reverse engineered
from Game Guru's ARM executable and verified byte-for-byte against
known compressed and decompressed file pairs.
The .COMP file starts with a 6-byte header followed by a
byte-inverted adaptive-Huffman/LZSS payload:
| Offset | Size | Description |
|---|---|---|
0x00 |
2 | Big-endian decompressed size in bytes. Game Guru's implementation stores this as a 16-bit value, so individual compressed files are limited to 65535 decompressed bytes. |
0x02 |
4 | Original 3DO filesystem file type, byte-inverted with XOR 0xff. |
0x06 |
rest | Compressed payload. Every payload byte is also byte-inverted with XOR 0xff. |
The file-type field is not part of the compression algorithm. It is
the source file's 3DO filesystem type. In the Game Guru executable,
the file compression path reads the selected file-browser entry's
4-byte type word from entry offset 0x70, writes it into header bytes
2..5, then XORs those four bytes with 0xff. The file-type editor
uses the same field and displays it with %.4s.
Known examples:
| File | Header bytes 2..5 |
Decoded file type |
|---|---|---|
CasperSave1.COMP |
b1 a9 ad ab |
NVRT |
POed.A.COMP |
df df df df |
four spaces |
Use ggc-compress --file-type XXXX when the original 3DO filesystem
type is not recoverable from the host file. If omitted, 3ct uses the
first four bytes when they look like a plausible 3DO file type (A-Z,
0-9, or spaces), otherwise it uses four spaces.
Game Guru wraps the compressed data with a simple bytewise NOT transform:
- During compression, source bytes are first inverted (
byte ^ 0xff) before they are inserted into the LZ window and Huffman encoded. - During compression, each emitted payload byte is inverted again before it is written to disk.
- During decompression, each payload byte is inverted before bit parsing.
- During decompression, each decoded literal or copied match byte is inverted before it is written to the output file.
This means the compressor and decompressor operate internally on inverted file data, and the on-disk payload is also inverted at the byte-stream level.
GGC uses an LZSS-style sliding window with these constants:
| Constant | Value | Meaning |
|---|---|---|
N |
4096 |
Sliding window size. |
F |
60 |
Maximum match length. |
THRESHOLD |
2 |
Matches of length 1 or 2 are emitted as literals. |
| Initial window byte | 0x20 |
The whole window starts filled with spaces. |
| Initial window index | N - F = 4036 |
First output byte is inserted at this ring-buffer index. |
Token semantics:
| Huffman symbol | Meaning |
|---|---|
0..255 |
Literal byte. The internal decoded byte is stored in the window, then XORed with 0xff before writing to the output file. |
256..313 |
Match token. The match length is symbol - 253, producing lengths 3..60. A position code follows the symbol. |
Match copy semantics:
source_index = (current_window_index - decoded_position - 1) & 0x0fff
length = symbol - 253
The copied bytes are read from the 4096-byte ring buffer, inserted back into the ring buffer as they are copied, and inverted before writing to the output file.
The literal and match-length symbols are encoded with an adaptive
binary Huffman tree. This is close to Haruhiko Okumura's lzhuf
algorithm, but the Game Guru implementation uses -1 as the
root-parent sentinel; using the common public lzhuf.c root handling
causes the tree to drift after early updates.
Tree constants:
| Constant | Value | Meaning |
|---|---|---|
N_CHAR |
314 |
Number of encoded symbols: 256 literals plus 58 match-length symbols. |
T |
627 |
2 * N_CHAR - 1, total node count excluding sentinel frequency slot. |
R |
626 |
Root node index. |
MAX_FREQ |
0x8000 |
Frequency limit before tree reconstruction. |
Initial tree setup:
- Every symbol starts with frequency
1. - Leaves are represented as
symbol + T. - Internal nodes are built pairwise from low to high symbol order.
freq[T]is set to0xffffas a guard value.parent[R]is set to-1.
After each decoded or encoded symbol, Game Guru increments frequencies
while walking from the leaf's parent to the root. If a node's
frequency becomes larger than the next node's frequency, it is swapped
forward to preserve frequency ordering. When the root frequency
reaches 0x8000, the tree is reconstructed by halving leaf
frequencies with (freq + 1) / 2 and rebuilding the internal tree.
Match positions are 12-bit distances, but they are not stored as fixed 12-bit values. They are split into a 6-bit high part and a 6-bit low part, then encoded with a small static prefix table.
Decode procedure:
prefix = read_bits(8)
high = d_code[prefix] << 6
low = prefix
repeat d_len[prefix] - 2 times:
low = (low << 1) | read_bit()
position = high | (low & 0x3f)
The d_code table has 256 entries with these run lengths:
| Values | Repeated entries each |
|---|---|
0 |
32 |
1..3 |
16 |
4..11 |
8 |
12..23 |
4 |
24..47 |
2 |
48..63 |
1 |
The d_len table has 256 entries with these run lengths:
| Bit length | Repeated entries |
|---|---|
3 |
32 |
4 |
48 |
5 |
64 |
6 |
48 |
7 |
48 |
8 |
16 |
Compression uses the inverse mapping: select the first prefix byte
that maps to the position's high 6 bits, write that 8-bit prefix, then
write the low-bit suffix required by d_len[prefix] - 2.
The byte-exact compressor uses the same binary search tree parser as the ARM implementation:
- The tree is initialized with
N + 1throughN + 256roots andNas the null node sentinel. - The first
Fnodes before the starting window index are inserted before the first real token is emitted. - Strings are inserted and deleted as the window slides, matching Okumura-style LZSS parsing.
- When equal-length matches are found, Game Guru keeps the smaller encoded distance.
- Matches of length
1or2are emitted as literals becauseTHRESHOLDis2.
The exact tie-break is important: a compressor can produce a valid stream with a different match choice, but it will not be byte-identical to Game Guru output.
Huffman symbols and position codes are written most-significant-bit
first into an 8-bit output accumulator. Each full byte is XORed with
0xff before being appended to the .COMP payload. If the final byte
is partial, the remaining low bits are padded with zero bits before
the byte is inverted and written.
There is no explicit end-of-stream token in the GGC
payload. Decompression stops after producing the decompressed byte
count stored in header bytes 0..1.
The ggc-compress and ggc-decompress commands are tested against
the included samples:
build/3ct ggc-compress nvram.0.srm.unpacked/CasperSave1 /tmp/CasperSave1A.COMP
cmp /tmp/CasperSave1A.COMP nvram.0.srm.unpacked/CasperSave1A.COMP
build/3ct ggc-compress --file-type " " nvram.0.srm.unpacked/POed.A /tmp/POed.AA.COMP
cmp /tmp/POed.AA.COMP nvram.0.srm.unpacked/POed.AA.COMPCasperSave1 can infer NVRT from its first four bytes. POed.A
needs --file-type " " because its original 3DO filesystem type is
four spaces, while its first four file bytes are binary zeroes.
The compressor and decompressor support incremental (streaming)
operation. Data is fed in chunks via FeedCompressor or
FeedDecompressor; internal state (window position, BST, partial
bit-buffer) is preserved between calls. Convienece wrappers handle
one-shot operation.
- https://3dodev.com
- https://3dodev.com/documentation/development/opera/pf25/ppgfldr/pgsfldr/spg/14spg
- https://github.com/trapexit/portfolio_os_m2/tree/master/ws_root/src/tools/comp3do
- https://github.com/trapexit/portfolio_os_m2/tree/master/ws_root/src/tools/decomp3do
If you find 3ct useful please consider supporting its ongoing development.