ox is a modular command-line toolkit written in C for compressing, encoding, analyzing, and profiling symbolic sequences.
It includes bit-packing, entropy metrics, histogramming, finite-context modeling, and CRC-based hashing.
git clone https://github.com/cobilab/ox
cd ox/src/
make./ox <command> [options]Calculate basic statistics from a file.
./ox stats [-b 8|16] <filename>-b: bits data type (8 or 16 bits)
Generate random symbolic sequences.
./ox generate [-s <size>] [-c <cardinality>] [-e <seed>] <filename>-s: sequence size-c: alphabet cardinality (0β255)-e: random seed
Bit-pack sequences using 2-bit (ABCD) or 4-bit (AβP) encodings.
./ox pack2 pack <input> <output>
./ox pack2 unpack <input> <output>
./ox pack4 pack <input> <output>
./ox pack4 unpack <input> <output>Encode/decode sequences with a custom XRC-256 codec: order-0 followed by a Range Coder.
./ox xrc-256 encode <input> <output>
./ox xrc-256 decode <input> <output>Compute Shannon entropy of binary input.
./ox entropy [-v] <filename>-v: verbose output (byte frequencies and count)
Analyze distribution of values in a file (supports 8 and 16 bits).
./ox histogram [-h] [-t 8|16] [-w <width>] [-p] <filename>-t: data type (8 or 16 bits)-w: histogram width-p: plot instead of raw values-h: hide zero-count bins
Measure pattern distances in a sequence.
./ox distance -t <pattern> <filename>-t: pattern (e.g.,RRR,EXFGGHH)
Compute CRC32 checksum.
./ox crc32-hash <filename>Estimate local complexity using a finite-context model.
./ox profile [-k <ctx>] [-a <alphaDen>] [-w <window>] <filename>-k: model context order-a: smoothing parameter (1/a)-w: sliding window size
Print predefined analysis pipelines.
./ox pipelinesExample pipeline for DNA compression and decompression:
#!/bin/bash
grep -v '>' DNA.fa | tr -d -c 'ACGT' | tr 'ACGT' 'ABCD' > A.seq
./ox pack2 pack A.seq A.packed
./ox xrc-256 encode A.packed A.encoded
./ox xrc-256 decode A.encoded A.decoded
./ox pack2 unpack A.decoded A.unpacked
cmp A.unpacked A.seqPrint program version.
./ox versionpack2: expects sequence with only'A','B','C','D'pack4: expects symbols from'A'to'P'
# Prepare sequence
grep -v '>' input.fa | tr -d -c 'ACGT' | tr 'ACGT' 'ABCD' > seq.txt
# Pack using 2-bit
./ox pack2 pack seq.txt packed.bin
# Encode with custom codec
./ox xrc-256 encode packed.bin encoded.bin
# Decode
./ox xrc-256 decode encoded.bin decoded.bin
# Unpack to original
./ox pack2 unpack decoded.bin unpacked.txt
# Validate
cmp seq.txt unpacked.txtGPLv3 License