Skip to content

DotFox/transit.c

Repository files navigation

Transit.C

A fast, zero-copy Transit reader and writer written in C11 with SIMD acceleration. One codec-agnostic engine speaks all three Transit wire formats — JSON, JSON-Verbose, and MessagePack — and decodes straight into a single arena with borrowed (never copied) string payloads.

CI License: MIT

TL;DR - What is Transit?

Transit is a format and a set of libraries for conveying values between applications written in different languages. It is layered on top of JSON and MessagePack, so you get their tooling and speed, but with a much richer type system and built-in payload compression. Think of it as "JSON that round-trips real types":

  • Ground types from the host format: maps, arrays, strings, numbers, booleans, null
  • Extension types JSON lacks: keywords :foo, symbols, instants (timestamps), UUIDs, URIs, big integers/decimals, characters, byte arrays, sets, and lists
  • Built-in compression (caching): repeated map keys, keywords, symbols, and tags are written once and then referenced by a short ^N code, so verbose, key-heavy payloads shrink dramatically
  • Extensible via tagged values: ship your own types over the wire with a tag plus a representation, and decode them with a custom handler — no more {"__type": "Date", "value": "..."} hacks
  • Language-agnostic & self-describing: originally from Cognitect/Clojure, with implementations across many languages

Why Transit over plain JSON? A real type system (a keyword stays a keyword, an instant stays an instant), smaller payloads thanks to key caching, and first-class extensibility through tags — all while staying on top of the JSON/MessagePack infrastructure you already have.

Learn more: Official Transit specification

Features

  • 🚀 Fast: SIMD-accelerated string scanning (SSE2 on x86-64, NEON on arm64), Grisu2 shortest-double formatting, and a memset-free tokenizer with inline integer parsing
  • 💾 Zero-copy: transform-free strings, byte arrays, and keys are borrowed directly from the input buffer and threaded into the result tree — never copied
  • 🧩 One engine, three formats: a single codec-agnostic reader state machine and a single writer walk serve JSON, JSON-Verbose, and MessagePack; a format is just a token reader/writer behind one transit_codec_t
  • 📡 Streaming emitter: a push API (transit_emit_*) that produces Transit incrementally without building a value tree, byte-identical to transit_write
  • 🗜️ Transit caching: ^N cache codes for repeated keys/keywords/symbols/tags, with reader and writer kept in lock-step
  • 🏷️ Rich type system: keywords, symbols, instants, UUIDs, URIs, bigint/bigdec, characters, byte arrays, sets, lists, and arbitrary tagged values
  • 🔌 Custom handlers: decode your own composite tags into rich values at read time
  • 🧹 Memory-safe: a single bulk arena backs each result tree — one transit_result_free() frees everything
  • 📏 No recursion: the reader drives an explicit container stack, so there is no C-stack depth limit (tested to 60,000 levels of nesting)
  • 🔧 Zero dependencies: pure C11 and the standard library only
  • ✅ Conformance-tested: runs against the official cognitect/transit-format cross-implementation exemplar corpus
  • 📦 Portable: SSE2/NEON SIMD with a strict-portability scalar fallback (NO_SIMD=1); builds as a static or shared library on Linux, macOS, and Windows

Table of Contents

Installation

Requirements

  • C11-compatible compiler (GCC 4.9+, Clang 3.1+, MSVC 2015+)
  • Make (Unix/macOS) or CMake (Windows/cross-platform)
  • Supported platforms:
    • macOS (Apple Silicon, Intel) — NEON/SSE2 SIMD
    • Linux (arm64, x86-64) — NEON/SSE2 SIMD
    • Windows (x86-64, arm64) — via MSVC/MinGW/Clang

Build Library

Unix/macOS/Linux:

# Clone the repository (with the exemplar corpus submodule)
git clone --recurse-submodules https://github.com/DotFox/transit.c.git
cd transit.c

# Build the static library (build/libtransit.a)
make

# Run tests to verify the build
make test

Windows:

# Clone the repository
git clone https://github.com/DotFox/transit.c.git
cd transit.c

# Build with CMake (works with MSVC, MinGW, Clang)
.\build.bat

# Or use the PowerShell script
.\build.ps1 -Test

Integrate Into Your Project

Option 1: Link the static library

# Compile your code against the public header and archive
cc -o myapp myapp.c -I/path/to/transit.c/include -L/path/to/transit.c/build -ltransit -lm

# Or add to your Makefile
CFLAGS  += -I/path/to/transit.c/include
LDFLAGS += -L/path/to/transit.c/build -ltransit -lm

Option 2: Include the source directly

Copy include/transit.h and every file under src/ into your project and compile them together. Only include/transit.h is public; the src/ headers are internal.

Quick Start

#include "transit.h"
#include <stdio.h>
#include <string.h>

int main(void) {
    /* Transit-JSON for the map {:name "Alice" :age 30 :langs [:clojure :rust]}.
       In compact Transit-JSON a map is an array led by the "^ " marker and
       keywords are written as "~:name". */
    const char *input =
        "[\"^ \",\"~:name\",\"Alice\",\"~:age\",30,\"~:langs\",[\"~:clojure\",\"~:rust\"]]";

    /* Read Transit (zero-copy: payloads borrow from `input`). */
    transit_result_t r = transit_read(transit_codec_json(),
                                    (const uint8_t *)input, strlen(input));

    if (r.error != TRANSIT_OK) {
        fprintf(stderr, "read error at byte %zu: %s\n", r.position, r.message);
        return 1;
    }

    printf("decoded a map with %zu entries\n", transit_count(r.value));

    /* Look up the first value; the span borrows from `input`. */
    transit_span_t name = transit_as_span(transit_map_val(r.value, 0));
    printf("name: %.*s\n", (int)name.len, name.ptr);

    /* One call frees the whole tree (a single arena). */
    transit_result_free(&r);
    return 0;
}

Output:

decoded a map with 3 entries
name: Alice

Wire Formats

The same value model and the same read/write algorithms drive all three formats; you pick one by passing the matching codec.

Codec Selector Encoding Caching Notes
JSON transit_codec_json() text yes Compact Transit-over-JSON; the default for interchange
JSON-Verbose transit_codec_json_verbose() text no Human-readable: native JSON objects for maps and RFC3339 instants
MessagePack transit_codec_msgpack() binary yes Compact binary Transit-over-MessagePack

Adding a new wire format means implementing a token reader + writer behind a transit_codec_t descriptor; the semantic layer never branches on format.

API Reference

The entire public surface lives in a single header:

#include "transit.h"

Core Functions

transit_read()

Read a value from a buffer using the given codec.

transit_result_t transit_read(const transit_codec_t *codec, const uint8_t *input, size_t len);

Parameters:

  • codec: one of transit_codec_json(), transit_codec_json_verbose(), transit_codec_msgpack()
  • input: the encoded bytes (must remain valid and unmodified for zero-copy reads)
  • len: length of input in bytes

Returns: a transit_result_t (see Errors). On success, .error == TRANSIT_OK and .value holds the decoded tree.

Important: free the result with transit_result_free(). String/bytes payloads in the tree may point directly into input, so the buffer must outlive the result.

transit_read_opts()

Like transit_read(), but with explicit options (verbose semantics, cache toggle, custom handlers).

transit_result_t transit_read_opts(const transit_codec_t *codec, const uint8_t *input,
                                   size_t len, const transit_read_options_t *opts);

transit_write()

Encode a value with the given codec, appending to a growable output buffer.

int transit_write(const transit_codec_t *codec, transit_value_t v, transit_outbuf_t *out);

Parameters:

  • codec: the target wire format
  • v: the value to encode
  • out: an initialized transit_outbuf_t; encoded bytes are appended to out->data (out->len bytes)

Returns: 0 on success, or a non-zero transit_error_t on failure.

transit_write_opts()

Like transit_write(), but with explicit options (verbose output, cache toggle).

int transit_write_opts(const transit_codec_t *codec, transit_value_t v, transit_outbuf_t *out,
                       const transit_write_options_t *opts);

transit_result_free()

Free a result and the entire value tree it owns.

void transit_result_free(transit_result_t *r);

Note: this frees the backing arena in one shot. Do not free individual values, and do not use any borrowed spans afterwards.

Codecs

const transit_codec_t *transit_codec_json(void);         /* compact JSON */
const transit_codec_t *transit_codec_json_verbose(void); /* verbose JSON */
const transit_codec_t *transit_codec_msgpack(void);      /* MessagePack  */

Each returns an immutable, process-global descriptor that is safe to share across threads.

Type System

transit_value_t is an exposed, by-value tagged union — you can inspect it directly or via the accessor functions. Its kind is one of:

transit_kind_t Payload Notes
TRANSIT_NULL
TRANSIT_BOOL bool
TRANSIT_INT int64_t signed 64-bit
TRANSIT_DOUBLE double
TRANSIT_STRING transit_span_t UTF-8 bytes
TRANSIT_BYTES transit_span_t raw bytes (base64 on the JSON wire)
TRANSIT_KEYWORD transit_span_t interned name, e.g. :foo
TRANSIT_SYMBOL transit_span_t
TRANSIT_URI transit_span_t
TRANSIT_UUID uint8_t[16] parsed 128-bit value
TRANSIT_INSTANT int64_t milliseconds since the Unix epoch
TRANSIT_CHAR int32_t Unicode codepoint
TRANSIT_BIGINT transit_span_t textual representation
TRANSIT_BIGDEC transit_span_t textual representation
TRANSIT_ARRAY items vector
TRANSIT_LIST items
TRANSIT_SET items
TRANSIT_MAP keys + vals preserves entry order
TRANSIT_TAGGED tag + representation extension point

A transit_span_t is a (pointer, length) view over bytes — not necessarily NUL-terminated:

typedef struct { const uint8_t *ptr; size_t len; } transit_span_t;
transit_kind_t transit_kind_of(transit_value_t v);              /* the value's kind */
bool           transit_is(transit_value_t v, transit_kind_t k); /* kind == k        */

Scalar Accessors

bool           transit_as_bool(transit_value_t v);
int64_t        transit_as_int(transit_value_t v);
double         transit_as_double(transit_value_t v);
int32_t        transit_as_char(transit_value_t v);    /* Unicode codepoint     */
int64_t        transit_as_instant(transit_value_t v); /* millis since epoch    */
transit_span_t transit_as_span(transit_value_t v);    /* string-family + bytes */

transit_as_span() covers every span-backed kind: STRING, BYTES, KEYWORD, SYMBOL, URI, BIGINT, and BIGDEC.

Collections

size_t          transit_count(transit_value_t v);               /* element/entry count */
transit_value_t transit_array_get(transit_value_t v, size_t i); /* ARRAY/LIST/SET      */
transit_value_t transit_map_key(transit_value_t v, size_t i);   /* i-th MAP key        */
transit_value_t transit_map_val(transit_value_t v, size_t i);   /* i-th MAP value      */

Maps are iterated by parallel index — transit_map_key(m, i) and transit_map_val(m, i) give the i-th entry, in document order.

Example: walk a decoded map.

for (size_t i = 0; i < transit_count(map); ++i) {
    transit_span_t k = transit_as_span(transit_map_key(map, i));
    transit_value_t v = transit_map_val(map, i);
    printf("%.*s -> kind %d\n", (int)k.len, k.ptr, transit_kind_of(v));
}

Constructors

Scalar and string-family constructors do not allocate — string spans are borrowed from the caller, so the bytes must outlive any write that uses them.

transit_value_t transit_null(void);
transit_value_t transit_bool(bool b);
transit_value_t transit_int(int64_t i);
transit_value_t transit_double(double d);
transit_value_t transit_char(int32_t codepoint);
transit_value_t transit_instant(int64_t millis);
transit_value_t transit_string(const char *s, size_t n);
transit_value_t transit_string_z(const char *s); /* NUL-terminated */
transit_value_t transit_bytes(const uint8_t *p, size_t n);
transit_value_t transit_keyword(const char *s, size_t n);
transit_value_t transit_symbol(const char *s, size_t n);
transit_value_t transit_uri(const char *s, size_t n);
transit_value_t transit_bigint(const char *s, size_t n);
transit_value_t transit_bigdec(const char *s, size_t n);
transit_value_t transit_uuid(const uint8_t bytes[16]);
transit_value_t transit_tagged(transit_span_t tag, transit_value_t *rep);

Containers are built into an arena (see below):

transit_value_t transit_array(transit_arena_t *a);
transit_value_t transit_list(transit_arena_t *a);
transit_value_t transit_set(transit_arena_t *a);
transit_value_t transit_map(transit_arena_t *a);
void            transit_array_push(transit_arena_t *a, transit_value_t *arr, transit_value_t item);
void            transit_map_put(transit_arena_t *a, transit_value_t *m, transit_value_t key, transit_value_t val);

Arena & Output Buffer

An arena is a bump allocator: allocate as you build, free everything at once.

transit_arena_t *transit_arena_create(size_t first_block);  /* 0 = sensible default  */
void             transit_arena_destroy(transit_arena_t *a);
size_t           transit_arena_bytes_allocated(const transit_arena_t *a);

The output buffer is a growable, heap-backed byte buffer for the write path.

typedef struct { uint8_t *data; size_t len; size_t cap; } transit_outbuf_t;

void transit_outbuf_init(transit_outbuf_t *o);
void transit_outbuf_free(transit_outbuf_t *o);

Custom Read Handlers

A handler turns a (tag, representation) pair into a rich value at read time. Any allocation must come from the supplied arena.

typedef transit_value_t (*transit_read_handler_fn)(transit_span_t tag, transit_value_t rep,
                                                   transit_arena_t *arena, void *user);

transit_handlers_t *transit_handlers_create(void);
void                transit_handlers_destroy(transit_handlers_t *h);
bool                transit_handlers_add(transit_handlers_t *h, const char *tag,
                                         transit_read_handler_fn fn, void *user);

transit_handlers_add() returns false if tag is a reserved built-in composite tag (', set, list, cmap, map) — those cannot be overridden. Pass the handler set through transit_read_options_t.handlers.

Example: decode a point tag whose representation is [x, y] into the integer x + y.

static transit_value_t point_sum(transit_span_t tag, transit_value_t rep,
                                 transit_arena_t *arena, void *user) {
    (void)tag; (void)arena; (void)user;
    int64_t x = transit_as_int(transit_array_get(rep, 0));
    int64_t y = transit_as_int(transit_array_get(rep, 1));
    return transit_int(x + y);
}

transit_handlers_t *h = transit_handlers_create();
transit_handlers_add(h, "point", point_sum, NULL);

transit_read_options_t opts = transit_read_options_default();
opts.handlers = h;
transit_result_t r = transit_read_opts(transit_codec_json(), buf, len, &opts);
/* ... use r ... */
transit_result_free(&r);
transit_handlers_destroy(h);

Options

typedef struct {
    bool verbose;                 /* force verbose semantics on read (rare) */
    bool cache_enabled;           /* default true; ignored when verbose     */
    transit_handlers_t *handlers; /* optional custom read handlers          */
} transit_read_options_t;

typedef struct {
    bool verbose;                /* native maps + RFC3339 instants; disables cache */
    bool cache_enabled;          /* default true; forced false when verbose        */
} transit_write_options_t;

transit_read_options_t  transit_read_options_default(void);
transit_write_options_t transit_write_options_default(void);

Always start from the *_default() value and override the fields you need, so new options pick up sane defaults.

Errors

transit_read / transit_write report through transit_result_t and transit_error_t:

typedef struct {
    transit_value_t  value;    /* decoded tree (valid only when error == TRANSIT_OK) */
    int              error;    /* transit_error_t                                    */
    const char      *message;  /* static or arena-owned description                  */
    size_t           position; /* byte offset of the error, where known              */
    transit_arena_t *arena;    /* owns the tree; freed by transit_result_free        */
} transit_result_t;
Code Meaning
TRANSIT_OK success
TRANSIT_ERR_EOF unexpected end of input
TRANSIT_ERR_SYNTAX malformed wire form
TRANSIT_ERR_TYPE type or shape violation
TRANSIT_ERR_CACHE invalid cache reference
TRANSIT_ERR_OOM allocation failure
TRANSIT_ERR_OVERFLOW numeric overflow
TRANSIT_ERR_UTF8 invalid UTF-8
TRANSIT_ERR_WRITE output/encoding failure
TRANSIT_ERR_STATE streaming emitter used out of order

Streaming Emitter

Produce Transit incrementally, without first building a value tree — push one value or container event at a time. The emitter applies the same Transit transforms as transit_write (tag rewriting, caching, escaping), so its output is byte-identical to transit_write of the equivalent value.

typedef struct transit_emitter transit_emitter_t;
transit_emitter_t* transit_emitter_create(const transit_codec_t* codec, transit_outbuf_t* out,
                                          const transit_write_options_t* opts);
int  transit_emitter_finish(transit_emitter_t*);   /* exactly one top-level value */
void transit_emitter_destroy(transit_emitter_t*);  /* NULL-safe */

/* scalars: null, bool, int, double, char, instant, uuid, string, bytes,
   keyword, symbol, uri, bigint, bigdec */
int transit_emit_int(transit_emitter_t*, int64_t);
int transit_emit_keyword(transit_emitter_t*, const char*, size_t);
/* ... and the rest ... */

int transit_emit_value(transit_emitter_t*, transit_value_t v);          /* whole subtree */
int transit_emit_tag(transit_emitter_t*, const char* tag, size_t n);    /* next value -> ["~#tag", rep] */

/* collections: *_begin (delimiter codecs), *_counted (all codecs), *_end */
int transit_emit_array_begin(transit_emitter_t*);
int transit_emit_array_counted(transit_emitter_t*, size_t n);
int transit_emit_array_end(transit_emitter_t*);
/* ... list / set / map likewise ... */

Counted vs. non-counted collections. The *_begin forms stream without a known length and work for the delimiter codecs (JSON / JSON-Verbose). MessagePack is length-prefixed, so it requires the *_counted forms (count = number of elements; for maps, the number of key/value entries). The *_counted forms work for every codec, and the emitter validates that you push exactly the declared number of elements. Misuse — a *_begin on MessagePack, a count mismatch, a dangling map key, a second top-level value, or finishing with an unclosed container — returns TRANSIT_ERR_STATE and leaves the emitter unusable (still destroy it).

/* Stream {:name "Alice" :age 30} as compact Transit-JSON. */
transit_outbuf_t out;
transit_outbuf_init(&out);

transit_emitter_t* em = transit_emitter_create(transit_codec_json(), &out, NULL);
transit_emit_map_counted(em, 2); /* 2 entries (or transit_emit_map_begin for JSON) */
transit_emit_keyword(em, "name", 4);
transit_emit_string(em, "Alice", 5);
transit_emit_keyword(em, "age", 3);
transit_emit_int(em, 30);
transit_emit_map_end(em);

if (transit_emitter_finish(em) == TRANSIT_OK)
    printf("%.*s\n", (int) out.len, (char*) out.data);  /* ["^ ","~:name","Alice","~:age",30] */

transit_emitter_destroy(em);
transit_outbuf_free(&out);

Examples

Writing a value

#include "transit.h"
#include <stdio.h>

int main(void) {
    transit_arena_t *a = transit_arena_create(0);

    transit_value_t m = transit_map(a);
    transit_map_put(a, &m, transit_keyword("name", 4), transit_string_z("Alice"));
    transit_map_put(a, &m, transit_keyword("age", 3),  transit_int(30));

    transit_outbuf_t out;
    transit_outbuf_init(&out);

    if (transit_write(transit_codec_json(), m, &out) == 0)
        printf("%.*s\n", (int)out.len, (char *)out.data);

    transit_outbuf_free(&out);
    transit_arena_destroy(a);
    return 0;
}

Output:

["^ ","~:name","Alice","~:age",30]

Round-tripping across formats

Because every codec shares one value model, a tree decoded from one format re-encodes cleanly into another — read MessagePack, write JSON-Verbose, and so on:

transit_result_t r = transit_read(transit_codec_msgpack(), mp, mp_len);

transit_outbuf_t out;
transit_outbuf_init(&out);
transit_write(transit_codec_json_verbose(), r.value, &out);  /* now human-readable JSON */

transit_outbuf_free(&out);
transit_result_free(&r);

A complete, self-contained program lives in examples/example.c — build it with make examples.

Building

Standard Build (Unix/macOS/Linux)

# Build the static library (build/libtransit.a)
make

# Build and run all auto-discovered test suites
make test

# Build and run a single suite
make single-test T=test_reader

# Build and run tests under AddressSanitizer + UBSan
make asan

# Build and run benchmarks (with the zero-copy regression gate)
make bench

# Build the example programs (against the public header only)
make examples

# Remove build artifacts
make clean

Conformance Corpus

The test suite includes a harness that runs the official cross-implementation corpus from cognitect/transit-format: 67 small examples (.json / .verbose.json / .mp) plus two large real-world datasets. It is a git submodule at third_party/transit-format (referenced, never vendored — the corpus is CC BY-SA 4.0).

git clone --recurse-submodules https://github.com/DotFox/transit.c.git  # fetches it
# or, in an existing checkout:
make corpus   # = git submodule update --init (clones as a fallback if needed)
make test     # test_exemplars runs automatically when the corpus is present

When the corpus is absent, the exemplar suite skips cleanly so the rest of the tests still pass.

Windows Build

# CMake wrapper scripts
.\build.bat
.\build.ps1 -Test

# Or drive CMake directly
mkdir build; cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
cmake --build . --config Release
ctest -C Release --output-on-failure

CMake (cross-platform)

cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build
ctest --test-dir build --output-on-failure

# Sanitizer build
cmake -B build-san -DTRANSIT_SANITIZE=ON && cmake --build build-san

Build Options

  • make LTO=1-O3 plus link-time optimization (cross-module inlining). The resulting archive is LTO bitcode, so consumers must also link with an LTO-capable toolchain; the default build stays a portable object archive.
  • make NO_SIMD=1 (or cmake -DTRANSIT_NO_SIMD=ON) — force the strict-portability scalar scan path, with no -m flags and no platform intrinsics.

By default the read path uses platform SIMD (SSE2 on x86-64, NEON on arm64 — both baseline 64-bit ABIs, so no -m flags are needed) with a portable scalar fallback everywhere else.

Code Formatting

make format        # auto-format all sources with clang-format (run before committing!)
make format-check  # verify formatting without modifying files (CI gate)

LSP Support

# Generate compile_commands.json + .clangd for clangd (requires bear or compiledb)
make compile-commands

# Regenerate just the .clangd configuration
make clangd

Fuzzing

The reader is fuzz-tested through a libFuzzer harness (fuzz/fuzz_read.c) that decodes arbitrary bytes with every codec, walks the result through the public accessors, and round-trips it through the writer — all under ASan/UBSan.

# Coverage-guided fuzzing (needs a libFuzzer-capable clang: full Xcode or
# `brew install llvm`, not Apple Command Line Tools)
make fuzz
build/fuzz/fuzz_read -max_total_time=60 fuzz/corpus

Performance

Transit.C is designed for high throughput, with several optimizations:

  • SIMD acceleration: vectorized JSON string scanning (closing-quote search and whitespace/separator skipping) on SSE2/NEON
  • Zero-copy strings: transform-free payloads point directly into the input buffer; a copy happens only when a transformation is unavoidable (JSON unescape, base64 decode, numeric/temporal parse)
  • Single bulk arena: the whole result tree is bump-allocated and freed in one shot — no per-node malloc/free
  • Grisu2 doubles: shortest round-tripping decimals with no snprintf/strtod on the hot path
  • Inline cache helpers: the per-string cache logic is static inline, so it inlines into both the reader and the writer
  • memset-free tokenizer with inline integer parsing

Indicative single-thread JSON throughput on a 53 KB real-world dataset (examples/0.8/example.json, Apple M-series). Run make bench for your own machine; build/bench/bench_profile {decode,encode} <iters> profiles a single codec under a sampling profiler.

Build Decode Encode
default (-O2) ~580 MB/s ~330 MB/s
make LTO=1 (-O3 + LTO) ~635 MB/s ~370 MB/s

make bench also asserts the zero-copy property as a hard gate: read-path arena usage is independent of string payload size, so a regression that introduces a redundant copy fails the build.

Project Status

Complete features:

  • All three wire formats (JSON, JSON-Verbose, MessagePack), read and write, through one codec-agnostic engine
  • The full Transit type system: scalars, strings, keywords, symbols, URIs, UUIDs, instants, characters, bigint/bigdec, byte arrays, arrays, lists, sets, maps, and tagged values
  • Zero-copy reads with borrowed spans and a single bulk arena per result
  • ^N caching for repeated keys/keywords/symbols/tags, synchronized between reader and writer
  • Custom read handlers for composite tags
  • A streaming emitter (transit_emit_*) that writes incrementally without a value tree, byte-identical to transit_write
  • Grisu2 shortest-double formatting
  • Platform SIMD scanning (SSE2/NEON) with a portable scalar fallback (NO_SIMD=1)
  • Cross-platform builds (Linux, macOS, Windows) via Make and CMake

Testing:

  • Unit, cross-codec, and semantic suites, plus the official cognitect/transit-format exemplar corpus
  • Memory safety verified under AddressSanitizer + UBSan
  • A libFuzzer harness over the reader (make fuzz), with size-overflow hardening in the arena and growth paths
  • A zero-copy regression gate wired into make bench
  • Edge-case coverage: deep nesting (60,000 levels), Unicode, cache wraparound, cross-representation conformance, and round-trip stability

📋 Roadmap:

  • Additional SIMD targets (32-bit x86/ARM, RISC-V Vector Extension)
  • Broader write-side options

Notes

Thread safety

  • transit_value_t trees, transit_arena_t instances, and transit_outbuf_t instances are not thread-safe. Use one per thread, or guard with external synchronization.
  • The codec descriptors returned by transit_codec_json(), transit_codec_json_verbose(), and transit_codec_msgpack() are immutable process-global singletons and are safe to share across threads.
  • A transit_handlers_t set is mutable while you populate it. Build it fully before any read, and do not mutate it concurrently with transit_read_opts.
  • The pure accessors (transit_kind_of, transit_as_*, transit_count, transit_map_key/transit_map_val, …) are safe to call from multiple threads against the same tree, provided no thread is freeing it.

Symbol visibility (TRANSIT_API)

Public functions are decorated with TRANSIT_API. By default it expands to a plain declaration suitable for static-library builds. For shared-library builds, define one of these at compile time:

  • -DTRANSIT_BUILD_SHARED when compiling the library itself (emits the exported visibility attribute on Unix-likes and __declspec(dllexport) on Windows).
  • -DTRANSIT_USE_SHARED when compiling a consumer that links against a Windows DLL (emits __declspec(dllimport)).

On Unix-likes the import side is a no-op, so consumers don't need to define anything.

Lifetime contract (zero-copy)

The buffer passed to transit_read must remain valid and unmodified until the returned transit_result_t is freed: string and bytes payloads in the tree may point directly into it. If you need a value to outlive its input, copy the bytes out before freeing the result.

Contributing

Contributions are welcome! Please:

  1. Run make format before committing (auto-formats with clang-format)
  2. Ensure all tests pass with make test (and, ideally, make asan and make test NO_SIMD=1)
  3. Add tests for new features
  4. Keep the semantic layer codec-agnostic — a new wire format is a new transit_codec_t, not a branch in reader.c / writer.c
  5. Follow the existing code style (K&R, 4 spaces, C11; see .clang-format)

License

MIT License

Copyright (c) 2026 [Kirill Chernyshov]

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Acknowledgments

  • Transit specification: https://github.com/cognitect/transit-format
  • Exemplar corpus from cognitect/transit-format (CC BY-SA 4.0)
  • Grisu2 shortest double formatting (Florian Loitsch, 2010) — "Printing Floating-Point Numbers Quickly and Accurately with Integers"
  • SIMD optimization patterns from high-performance JSON parsers (simdjson, yyjson)

About

A data interchange format and set of libraries for conveying values between applications written in different programming languages.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors