Skip to content

Tags: twmb/avro

Tags

v1.7.2

Toggle v1.7.2's commit message
v1.7.2: never panic on hostile input; bound recursion across all paths

v1.7.1

Toggle v1.7.1's commit message
v1.7.1: audit fixes

Correctness
- timestamp-nanos encode errors on overflow instead of silent wrap
- json.Number exponent notation (1.5e3) accepted for int/long
- EncodeJSON float32 overflow errors instead of invalid '+Inf' literal

Symmetry
- decimal decode into float32/float64/string matches encode targets
- json.Number in arrays/maps handled like scalars

SchemaFor
- recursive Go types (linked lists, trees, mutual recursion)

Performance
- map[string]any record encode: zero-alloc fast path

v1.7.0

Toggle v1.7.0's commit message
v1.7.0: SchemaFor: type-alias tag, auto null default, fixed type naming

Add type-alias struct tag for named type aliases (record, enum, fixed)
during schema evolution. Add bracket syntax for multi-value alias and
type-alias tags (alias=[a,b]) — safe because brackets are not valid
in Avro names per the spec.

Pointer types (*T) now automatically get "default": null, making
nullable fields backward-compatible out of the box.

Fixed types ([N]byte) now use the Go type name when available
(e.g. type MD5 [16]byte → "name": "MD5"), falling back to "fixed_N"
for unnamed arrays.

Harden splitTag to error on unclosed delimiters and validate
type-alias targets non-primitive named types.

v1.6.0

Toggle v1.6.0's commit message
v1.6.0: ocf.WithReaderSchemaFunc, better null semantics, numeric over…

…flow guards on decode

New ocf.WithReaderSchemaFunc takes a callback invoked after the OCF
header is parsed, so callers can inspect rd.Schema() and rd.Metadata()
before choosing a reader schema. Returning nil skips resolution. This
unblocks the alias-based schema-evolution pattern for formats like
Iceberg manifest lists where the reader schema depends on metadata or
writer-schema shape only available post-header.

WithReaderSchema's doc clarifies writer/reader terminology and notes the
two options are mutually exclusive.

Null branches now always decode to the target's Go zero value,
replacing any prior content across all Kinds — matching
encoding/json/v2.

Numeric decode paths now range-check against the target Go type
instead of silently wrapping: Avro long(2^33) into Go int32 returns
an error instead of -2^31; Avro int(-1) into uint32 errors instead
of 2^32-1. Brings decode to parity with encode.

Float64 → float32 narrowing errors on overflow-to-Inf (binary decode,
JSON decode, encode). NaN and ±Inf pass through unchanged. In-range
precision-loss rounding stays silent, matching json/v2's "rounded or
clamped" rule.

v1.5.0

Toggle v1.5.0's commit message
1.5.0: Decode decimal to *big.Rat, export RatFromBytes

Decode and DecodeJSON now return *big.Rat instead of json.Number when
decoding decimal logical types into *any targets, matching the behavior
of hamba/avro and linkedin/goavro. This is a breaking change for code
that type-asserts json.Number from decoded decimal values — in practice
only redpanda-data/connect relied on this, and it is being updated to
use a CustomType callback instead. No further breakage is expected.
json.Number is still supported as a typed struct target.

RatFromBytes is now exported to support CustomType Decode callbacks
that override decimal handling — without it, users would need to
reimplement two's complement byte decoding. DurationFromBytes serves
the same role for the duration logical type and is now documented
accordingly. SchemaNode.Scale and SchemaNode.Precision are now
correctly populated when a CustomType matches a primitive decimal
schema.

EncodeJSON fixes: json.Number values with decimal points or scientific
notation (e.g. "42.0", "1e2") are now correctly accepted for int/long
schemas. Float precision overflow error messages now reference the
correct type (float32/float64).

v1.4.1

Toggle v1.4.1's commit message
v1.4.1: Spec compliance fixes for canonical form.

- aobject.MarshalJSON always emits "fields":[] for record/error and
  "symbols":[] for enum, per Avro spec Complex Types > Records/Enums.
  The old struct-tag path dropped them via omitempty, producing JSON
  that strict readers (Java Avro) reject. Found via Spark reading an
  iceberg-go unpartitioned-table manifest.

- MarshalJSON now honors PCF [ORDER]: name, type, fields, symbols,
  items, values, size, then non-PCF attributes.

Fingerprint note: Rabin fingerprint changes only for schemas that
were previously emitted without a "fields" key — those were invalid
Avro and no conforming reader accepted them.

v1.4.0

Toggle v1.4.0's commit message
v1.4.0: Avro spec fixes, fixed(16) UUID support, atype pkg, ocf schem…

…a opts

Improvements from an iceberg-go migration off hamba/avro.

- New github.com/twmb/avro/atype subpackage exporting untyped string
  constants for Avro primitive types, complex types, logical types, and
  field sort orders. Catches typos at compile time; groups all spec
  names under one import; Duration logical type constant doesn't
  collide with the top-level Duration struct.

- New ocf.WithSchemaOpts ReaderOpt forwarding avro.SchemaOpt values to
  the file-header Parse call. CustomType registration now works for
  OCF reads.

A schema that reuses a named type (e.g. a fixed UUID in two fields)
used to fail with "duplicate named type". Schema() and SchemaFor now
emit the definition once and a name reference thereafter.

- Identical redefinitions: silently deduped.
- Conflicting redefinitions (same name, different content): detected
  via JSON comparison, error "conflicting definitions for named type".

Per the spec, a union with "null" as the first branch defaults to null.

- Before: encoding a map[string]any with missing keys for nullable
  fields errored with "missing key".
- After: missing key → null branch.

fixed(16) with logicalType:"uuid" now respects the annotation.

- Decode into any: [16]byte (was raw bytes, logical type ignored).
- Decode into string: formatted hex-dash UUID.
- Encode accepts [16]byte, []byte of the right length, or a formatted
  UUID string.

typeFieldMapping resolved same-named fields by "first-seen wins".
Depth-first traversal visits embedded struct fields before the
enclosing struct's direct fields, so a shadowing direct field
incorrectly lost to the deeper embedded one.

- Fix: shallower wins regardless of declaration order, matching
  encoding/json and the library's own godoc.

The library never used a 1.26 feature. Test-only new(literal) syntax
replaced with a ptr[T](v T) *T helper.

New FuzzSchemaNode and FuzzEncodeMapMissingKeys fuzzers, plus
recursive linked-list and 3-level-nested record schemas added to the
existing fuzz corpus. Fuzzing uncovered three pre-existing issues,
all fixed:

- Cyclic SchemaNode via *SchemaNode Items/Values now errors cleanly
  instead of stack-overflowing (deduper tracks visited pointers).
- Root().Schema() round-trip now succeeds for schemas with
  non-canonical key casing like "tYpe". Parse was already lenient
  via encoding/json's default case-insensitive struct matching;
  nodeFromJSONObject now uses the same leniency.
- Parse and Root now agree on which duplicate JSON key wins. Both
  UnmarshalJSON methods normalize through map[string]RawMessage
  before struct decode so the dedup is consistent.

v1.3.4

Toggle v1.3.4's commit message
v1.3.4: more ecosystem parity / enhancements / more internal consistency

EncodeJSON type parity with Encode: json.Number, int→float, []byte→string,
TextMarshaler, time.Time for time-millis, RFC3339 for timestamps, date
strings, enum by index, *big.Rat/float64 for decimal, tagged union maps.

Validation: reject fractional floats for int/long, int32/int64 overflow,
float precision limits, fixed size mismatch, enum symbol validation.

DecodeJSON: accept JSON numbers for decimal round-trip, validate enum
symbols, reject leading-zero numbers, accept 1e999 as ±Infinity.

Encode: json.Number uses Int64() for full precision, time.Time for
time-millis, nil no longer panics.

Schema: canonical form no longer emits "name":"" for arrays/maps.

JSON: named escape sequences (\b, \t, \n, \f, \r) matching encoding/json.

v1.3.3

Toggle v1.3.3's commit message
v1.3.3

- Streaming JSON decoder: DecodeJSON rewritten with a single-pass byte
  scanner, eliminating the json.Unmarshal → Encode → Decode round-trip.
  Zero allocations for struct targets; 9 allocs for *any targets (down
  from 64). ~5x faster for structs, ~3x faster for *any.
- Logical type conversions centralized in logical.go; fixed missing
  .UTC() on unsafe fast path timestamp decode.
- Pre-computed fieldIdx, nameVal, defaultJSON on schema nodes.
- Slab string allocator and zero-copy field name lookups for JSON decode.
- Deleted fromAvroJSON dead code (178 lines).
- 4 new fuzz targets; 99% test coverage.

v1.3.2

Toggle v1.3.2's commit message
v1.3.2

- Fix nullable union encoding with nested pointers (#26)
- Encode accepts tagged union maps, enabling Decode(TaggedUnions) →
  Encode round-trips (#27)
- CustomType nil Decode bypasses built-in logical type handler with
  zero overhead, documented as stable contract (#27)
- CustomType nil Encode no longer suppresses logical type serializer (#27)