Skip to content

trapexit/3ct

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

3ct: 3DO Compression Tool

A port of the 3DO SDK's compression library to modern platforms with a command line tool.

Usage

$ 3ct --help-all
3ct: 3DO Compression Tool (v1.0.0)
Usage: 3ct [OPTIONS] SUBCOMMAND

Options:
  -h,--help                   Print this help message and exit
  --help-all                  List help for all subcommands

Subcommands:
compress
  Compress input file
  Positionals:
    input-filepath PATH:FILE REQUIRED
                                Path to input file
    output-filepath PATH:FILE   Path to output file (default: input + '.compressed')

decompress
  Decompress input file
  Positionals:
    input-filepath PATH:FILE REQUIRED
                                Path to input file
    output-filepath PATH:FILE   Path to output file (default: input + '.decompressed')

ggc-compress
  Compress input file as Game Guru .COMP
  Positionals:
    input-filepath PATH:FILE REQUIRED
                                Path to input file
    output-filepath PATH:FILE   Path to output file (default: input + '.COMP')
  Options:
    --file-type TEXT            4-byte 3DO filesystem file type for byte-exact headers

ggc-decompress
  Decompress Game Guru .COMP input file
  Positionals:
    input-filepath PATH:FILE REQUIRED
                                Path to input file
    output-filepath PATH:FILE   Path to output file (default: input + '.decompressed')

check
  Checks the compressor and decompressor against data generated by the 3DO SDK compression library


$ 3ct compress example.txt
- input:
  - filepath: example.txt
  - size_in_bytes: 1024
  - size_in_words: 256
- output:
  - filepath: example.txt.compressed
  - size_in_bytes: 144
  - size_in_words: 36

$ 3ct check
* output of 3ct compressor matches SDK
* output of 3ct decompressor matches SDK
* output of 3ct GGC compressor matches Game Guru
* output of 3ct GGC decompressor matches Game Guru

TODOs

  • Rework and modernize compression code

3DO Compression Algorithm

The 3DO SDK compression algorithm is based on LZSS (Lempel–Ziv–Storer–Szymanski). An LZ77-style dictionary-based algorithm optimized for the 3DO platform which replaces repeated byte sequences with back-references into a sliding window, whilee emitting novel bytes as literals.

Algorithm Details

  • type: Dictionary-based lossless compression (LZ77-style)
  • data structure: Binary search tree for fast string matching
  • sliding window: 4096 bytes (2^12)
  • 12-bit index for window position references
  • Window maintains previously seen data for pattern matching
  • look-ahead buffer: 18 bytes (2^4 + 2)
  • 4-bit length field encoding phase lengths of 2-18 bytes
  • Allows efficient encoding of repeated sequences
  • break-even point: 2 bytes (minimum phrase length is 3 bytes)
  • Phrases shorter than 3 bytes are stored as literal bytes instead

Encoding Format

Each token in the compressed stream is prefixed with a header bit:

  • header bit = 1: Literal byte (1 bit + 8 data bits, 9 total.) Emitted when no match longer than 2 bytes (BREAK_EVEN) is found.
  • header bit = 0: Phrase reference (1 bit + 12-bit index + 4-bit length, 17 total.) The length field stores match_length >= 3, encoding match lengths from 3 to 18 bytes.
  • end of stream: header bit = 0 followed by position = 0 (END_OF_STREAM), signaling the decompressor that no more data follows.

The phrase reference encodes:

  • position: 12-bit index into the sliding window (0 - 4095)
  • length: 4-bit value representing match length minus 3 (0 - 15, corresponding to 3 - 18 bytes)

String Matching

The compressor maintains a binary search tree (BST) over all substrings in the sliding window for efficient longest-match lookup. Each node stores parent, left-child, and right-child indices. When a maxium-lenght match (18 bytes) is found, the new node replaces the matched node in the tree. Strings are deleted from the tree as they slide out of the window.

Data Format

  • Input and output are word-oriented (32-bit words)
  • The bitstream is packed into 32-bit words in big-endian byte order, matching the 3DO's big-endian ARM60 processor. On little-endian hosts the words are byte-swapped.

Game Guru Compression Algorithm

Game Guru .COMP files use a different wrapper and bitstream than the 3DO SDK comp3do format. The implementation was reverse engineered from Game Guru's ARM executable and verified byte-for-byte against known compressed and decompressed file pairs.

Container Format

The .COMP file starts with a 6-byte header followed by a byte-inverted adaptive-Huffman/LZSS payload:

Offset Size Description
0x00 2 Big-endian decompressed size in bytes. Game Guru's implementation stores this as a 16-bit value, so individual compressed files are limited to 65535 decompressed bytes.
0x02 4 Original 3DO filesystem file type, byte-inverted with XOR 0xff.
0x06 rest Compressed payload. Every payload byte is also byte-inverted with XOR 0xff.

The file-type field is not part of the compression algorithm. It is the source file's 3DO filesystem type. In the Game Guru executable, the file compression path reads the selected file-browser entry's 4-byte type word from entry offset 0x70, writes it into header bytes 2..5, then XORs those four bytes with 0xff. The file-type editor uses the same field and displays it with %.4s.

Known examples:

File Header bytes 2..5 Decoded file type
CasperSave1.COMP b1 a9 ad ab NVRT
POed.A.COMP df df df df four spaces

Use ggc-compress --file-type XXXX when the original 3DO filesystem type is not recoverable from the host file. If omitted, 3ct uses the first four bytes when they look like a plausible 3DO file type (A-Z, 0-9, or spaces), otherwise it uses four spaces.

Byte Inversion

Game Guru wraps the compressed data with a simple bytewise NOT transform:

  • During compression, source bytes are first inverted (byte ^ 0xff) before they are inserted into the LZ window and Huffman encoded.
  • During compression, each emitted payload byte is inverted again before it is written to disk.
  • During decompression, each payload byte is inverted before bit parsing.
  • During decompression, each decoded literal or copied match byte is inverted before it is written to the output file.

This means the compressor and decompressor operate internally on inverted file data, and the on-disk payload is also inverted at the byte-stream level.

LZSS Layer

GGC uses an LZSS-style sliding window with these constants:

Constant Value Meaning
N 4096 Sliding window size.
F 60 Maximum match length.
THRESHOLD 2 Matches of length 1 or 2 are emitted as literals.
Initial window byte 0x20 The whole window starts filled with spaces.
Initial window index N - F = 4036 First output byte is inserted at this ring-buffer index.

Token semantics:

Huffman symbol Meaning
0..255 Literal byte. The internal decoded byte is stored in the window, then XORed with 0xff before writing to the output file.
256..313 Match token. The match length is symbol - 253, producing lengths 3..60. A position code follows the symbol.

Match copy semantics:

source_index = (current_window_index - decoded_position - 1) & 0x0fff
length       = symbol - 253

The copied bytes are read from the 4096-byte ring buffer, inserted back into the ring buffer as they are copied, and inverted before writing to the output file.

Adaptive Huffman Layer

The literal and match-length symbols are encoded with an adaptive binary Huffman tree. This is close to Haruhiko Okumura's lzhuf algorithm, but the Game Guru implementation uses -1 as the root-parent sentinel; using the common public lzhuf.c root handling causes the tree to drift after early updates.

Tree constants:

Constant Value Meaning
N_CHAR 314 Number of encoded symbols: 256 literals plus 58 match-length symbols.
T 627 2 * N_CHAR - 1, total node count excluding sentinel frequency slot.
R 626 Root node index.
MAX_FREQ 0x8000 Frequency limit before tree reconstruction.

Initial tree setup:

  • Every symbol starts with frequency 1.
  • Leaves are represented as symbol + T.
  • Internal nodes are built pairwise from low to high symbol order.
  • freq[T] is set to 0xffff as a guard value.
  • parent[R] is set to -1.

After each decoded or encoded symbol, Game Guru increments frequencies while walking from the leaf's parent to the root. If a node's frequency becomes larger than the next node's frequency, it is swapped forward to preserve frequency ordering. When the root frequency reaches 0x8000, the tree is reconstructed by halving leaf frequencies with (freq + 1) / 2 and rebuilding the internal tree.

Position Coding

Match positions are 12-bit distances, but they are not stored as fixed 12-bit values. They are split into a 6-bit high part and a 6-bit low part, then encoded with a small static prefix table.

Decode procedure:

prefix = read_bits(8)
high   = d_code[prefix] << 6
low    = prefix
repeat d_len[prefix] - 2 times:
    low = (low << 1) | read_bit()
position = high | (low & 0x3f)

The d_code table has 256 entries with these run lengths:

Values Repeated entries each
0 32
1..3 16
4..11 8
12..23 4
24..47 2
48..63 1

The d_len table has 256 entries with these run lengths:

Bit length Repeated entries
3 32
4 48
5 64
6 48
7 48
8 16

Compression uses the inverse mapping: select the first prefix byte that maps to the position's high 6 bits, write that 8-bit prefix, then write the low-bit suffix required by d_len[prefix] - 2.

Match Finder

The byte-exact compressor uses the same binary search tree parser as the ARM implementation:

  • The tree is initialized with N + 1 through N + 256 roots and N as the null node sentinel.
  • The first F nodes before the starting window index are inserted before the first real token is emitted.
  • Strings are inserted and deleted as the window slides, matching Okumura-style LZSS parsing.
  • When equal-length matches are found, Game Guru keeps the smaller encoded distance.
  • Matches of length 1 or 2 are emitted as literals because THRESHOLD is 2.

The exact tie-break is important: a compressor can produce a valid stream with a different match choice, but it will not be byte-identical to Game Guru output.

Bit Packing

Huffman symbols and position codes are written most-significant-bit first into an 8-bit output accumulator. Each full byte is XORed with 0xff before being appended to the .COMP payload. If the final byte is partial, the remaining low bits are padded with zero bits before the byte is inverted and written.

There is no explicit end-of-stream token in the GGC payload. Decompression stops after producing the decompressed byte count stored in header bytes 0..1.

Byte-Exact Verification

The ggc-compress and ggc-decompress commands are tested against the included samples:

build/3ct ggc-compress nvram.0.srm.unpacked/CasperSave1 /tmp/CasperSave1A.COMP
cmp /tmp/CasperSave1A.COMP nvram.0.srm.unpacked/CasperSave1A.COMP

build/3ct ggc-compress --file-type "    " nvram.0.srm.unpacked/POed.A /tmp/POed.AA.COMP
cmp /tmp/POed.AA.COMP nvram.0.srm.unpacked/POed.AA.COMP

CasperSave1 can infer NVRT from its first four bytes. POed.A needs --file-type " " because its original 3DO filesystem type is four spaces, while its first four file bytes are binary zeroes.

Streaming API

The compressor and decompressor support incremental (streaming) operation. Data is fed in chunks via FeedCompressor or FeedDecompressor; internal state (window position, BST, partial bit-buffer) is preserved between calls. Convienece wrappers handle one-shot operation.

Documentation

Donations / Sponsorship

If you find 3ct useful please consider supporting its ongoing development.

https://github.com/trapexit/support

About

3DO Compression Folio compatible compressor/decompressor

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages