Skip to content

iskorotkov/avro

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

409 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Logo

Go Report Card Build Status Coverage Status Go Reference GitHub release GitHub license

A fast Go avro codec


Why this fork?

hamba/avro was archived in January 2026 and will receive no further updates or bug fixes. There is no other actively maintained Go Avro library that matches its performance. This fork exists to keep the library maintained, fix bugs, and improve it further.

Unresolved issues from hamba/avro are tracked in ISSUES.md.

Changes since fork

  • Bug fix: registered missing logical types for local timestamps (local-timestamp-millis, local-timestamp-micros), which caused decoding errors
  • Bug fix: fixed enum type duplication in Go code generation (avrogen)
  • Go 1.24+ modernization: updated codebase to use latest Go idioms
  • CI updated: Go 1.26; tools (golangci-lint, gotestsum) pinned via go.mod tool directive
  • 19.5% decode speedup on internally used benchmark: optimized inlining and bounds checks in number parsing; iskorotkov/avro now decodes at 118.6 ns/op vs hamba/avro's 147.3 ns/op (Apple M4 Pro, zero allocations)

How to migrate from hamba/avro?

You have 2 options:

  1. Use replace directive in go.mod and replace hamba/avro with this library. Only 1 file needs to be changed.
  2. Replace hamba/avro with iskorotkov/avro in go.mod, then rewrite imports in all Go files from hamba/avro to iskorotkov/avro. A lot of files must be updated, but it is a cleaner approach and can be automated with bash script or LLM.

Overview

Install with:

go get github.com/iskorotkov/avro/v2

Note: This project has renamed the default branch from master to main. You will need to update your local environment.

Usage

type SimpleRecord struct {
	A int64  `avro:"a"`
	B string `avro:"b"`
}

schema, err := avro.Parse(`{
    "type": "record",
    "name": "simple",
    "namespace": "org.iskorotkov.avro",
    "fields" : [
        {"name": "a", "type": "long"},
        {"name": "b", "type": "string"}
    ]
}`)
if err != nil {
	log.Fatal(err)
}

in := SimpleRecord{A: 27, B: "foo"}

data, err := avro.Marshal(schema, in)
if err != nil {
	log.Fatal(err)
}

fmt.Println(data)
// Outputs: [54 6 102 111 111]

out := SimpleRecord{}
err = avro.Unmarshal(schema, data, &out)
if err != nil {
	log.Fatal(err)
}

fmt.Println(out)
// Outputs: {27 foo}

More examples in the godoc.

Types Conversions

Avro Go Struct Go Interface
null nil nil
boolean bool bool
bytes []byte []byte
float float32 float32
double float64 float64
long int*, int64, uint32** int, int64, uint32
int int*, int32, int16, int8, uint8*, uint16* int, uint8, uint16
fixed uint64 uint64
string string string
array []T []any
enum string string
fixed [n]byte [n]byte
map map[string]T{} map[string]any
record struct map[string]any
union see below see below
int.date time.Time time.Time
int.time-millis time.Duration time.Duration
long.time-micros time.Duration time.Duration
long.timestamp-millis time.Time time.Time
long.timestamp-micros time.Time time.Time
long.local-timestamp-millis time.Time time.Time
long.local-timestamp-micros time.Time time.Time
bytes.decimal *big.Rat *big.Rat
fixed.decimal *big.Rat *big.Rat
string.uuid string string

* Please note that the size of the Go type int is platform dependent. Decoding an Avro long into a Go int is only allowed on 64-bit platforms and will result in an error on 32-bit platforms. Similarly, be careful when encoding a Go int using Avro int on a 64-bit platform, as that can result in an integer overflow causing misinterpretation of the data.

** Please note that when the Go type is an unsigned integer care must be taken to ensure that information is not lost when converting between the Avro type and Go type. For example, storing a negative number in Avro of int = -100 would be interpreted as uint16 = 65,436 in Go. Another example would be storing numbers in Avro int = 256 that are larger than the Go type uint8 = 0.

Unions

The following union types are accepted: map[string]any, *T and any.

  • map[string]any: If the union value is nil, a nil map will be en/decoded. When a non-nil union value is encountered, a single key is en/decoded. The key is the avro type name, or schema full name in the case of a named schema (enum, fixed or record).
  • *T: This is allowed in a "nullable" union. A nullable union is defined as a two schema union, with one of the types being null (ie. ["null", "string"] or ["string", "null"]), in this case a *T is allowed, with T matching the conversion table above. In the case of a slice, the slice can be used directly.
  • *struct{}: implementing the UnionConverter interface:
// UnionConverter to handle Avro Union's in a type-safe way
type UnionConverter interface {
    // FromAny payload decode into any of the mentioned types in the Union.
    FromAny(payload any) error
    // ToAny from the Union struct
    ToAny() (any, error)
}

// for example:
const Schema = `{"name": "Payload", "type": "record", "fields": [{"name": "union", "type": ["int", {"type": "record", "name": "test", "fields" : [{"name": "a", "type": "long"}, {"name": "b", "type": "string"}]}]}]}`

type Payload struct {
    Union *UnionRecord `avro:"union"`
}

type UnionRecord struct {
    Int  *int
    Test *TestRecord
}

func (u *UnionRecord) ToAny() (any, error) {
    if u.Int != nil {
        return u.Int, nil
    } else if u.Test != nil {
        return u.Test, nil
    }

    return nil, errors.New("no value to encode")
}

func (u *UnionRecord) FromAny(payload any) error {
    switch t := payload.(type) {
    case int:
        u.Int = &t
    case TestRecord:
        u.Test = &t
    default:
        return errors.New("unknown type during decode of union")
    }

    return nil
}

type TestRecord struct {
    A int64  `avro:"a"`
    B string `avro:"b"`
}

Note due to way Go checks if some type implements these interface, the type used must be a pointer as the interface methods must be implemented with pointer receivers.

  • any: An interface can be provided and the type or name resolved. Primitive types are pre-registered, but named types, maps and slices will need to be registered with the Register function. In the case of arrays and maps the enclosed schema type or name is postfix to the type with a : separator, e.g "map:string". Behavior when a type cannot be resolved will depend on your chosen configuation options:
    • !Config.UnionResolutionError && !Config.PartialUnionTypeResolution: the map type above is used
    • Config.UnionResolutionError && !Config.PartialUnionTypeResolution: an error is returned
    • !Config.UnionResolutionError && Config.PartialUnionTypeResolution: any registered type will get resolved while any unregistered type will fallback to the map type above.
    • Config.UnionResolutionError && !Config.PartialUnionTypeResolution: any registered type will get resolved while any unregistered type will return an error.

TextMarshaler and TextUnmarshaler

The interfaces TextMarshaler and TextUnmarshaler are supported for a string schema type. The object will be tested first for implementation of these interfaces, in the case of a string schema, before trying regular encoding and decoding.

Enums may also implement TextMarshaler and TextUnmarshaler, and must resolve to valid symbols in the given enum schema.

Identical Underlying Types

One type can be ConvertibleTo another type if they have identical underlying types. A non-native type is allowed to be used if it can be convertible to time.Time, big.Rat or avro.LogicalDuration for the particular of LogicalTypes.

Ex.: type Timestamp time.Time

Custom Type Conversion

In case of incompatible types, custom type conversion functions can be registered with the RegisterTypeConverters function. This requires the use of map[string]any or []any. The type conversion for encoding will receive the original value that is to be encoded, and must return a data type that is compatible with the schema, as specified in the table above. The type conversion for decoding will receive the decoded value with a data type that is compatible with the schema, and its return value will be used as the final decoded value.

Untrusted Input

When processing untrusted Avro data, it is important to configure protection against Denial of Service (DoS) attacks. The library provides three configuration options to limit resource allocation:

  • Config.MaxByteSliceSize - Restricts the maximum size of bytes and string types (default: 1MiB)
  • Config.MaxSliceAllocSize - Limits the maximum size of slice allocations (default: unlimited)
  • Config.MaxMapAllocSize - Limits the maximum size of map allocations (default: unlimited)

For comprehensive security considerations and configuration requirements, please refer to SECURITY.md.

Benchmark

Benchmark source code can be found at: https://github.com/iskorotkov/avro-benchmarks

goos: darwin
goarch: arm64
pkg: github.com/iskorotkov/avro-benchmarks
cpu: Apple M4 Pro
BenchmarkGoAvroDecode-12                 1248079               960.1 ns/op           418 B/op         27 allocs/op
BenchmarkGoAvroEncode-12                  941415              1310 ns/op             877 B/op         63 allocs/op
BenchmarkGoGenAvroDecode-12              2594593               466.0 ns/op           320 B/op         11 allocs/op
BenchmarkGoGenAvroEncode-12              5187818               229.2 ns/op           240 B/op          3 allocs/op
BenchmarkHambaDecode-12                  8118740               147.3 ns/op            47 B/op          0 allocs/op
BenchmarkHambaEncode-12                  8540484               139.7 ns/op           112 B/op          1 allocs/op
BenchmarkHeetchDecode-12                   96644             12441 ns/op           37773 B/op        385 allocs/op
BenchmarkHeetchEncode-12                 5392798               221.5 ns/op           384 B/op          5 allocs/op
BenchmarkIskorotkovDecode-12            10105106               118.6 ns/op            47 B/op          0 allocs/op
BenchmarkIskorotkovEncode-12             8573349               139.3 ns/op           112 B/op          1 allocs/op
BenchmarkLinkedinDecode-12               1882018               637.8 ns/op          1688 B/op         35 allocs/op
BenchmarkLinkedinEncode-12               4127374               288.0 ns/op           248 B/op          5 allocs/op

Always benchmark with your own workload. The result depends heavily on the data input.

Go structs generation

Go structs can be generated for you from the schema. The types generated follow the same logic in types conversions You can use the avrogen command line tool to generate the structs, or use it as a lib in internal commands, it's the gen package.

Install the struct generator with:

go install github.com/iskorotkov/avro/v2/cmd/avrogen@<version>

Example usage assuming there's a valid schema in in.avsc:

avrogen -pkg avro -o bla.go -tags json:snake,yaml:upper-camel in.avsc

Tip: Omit -o FILE to dump the generated Go structs to stdout instead of a file.

Check the options and usage with -h:

avrogen -h

Custom logical type mapping with avrogen

You can register custom logical type mappings to be used during code generation.

The format of a custom logical type mapper is avroLogicalType,goType[,importPath]. For example, to map the logical type uuid to the Go type github.com/google/uuid.UUID, you would use:

avrogen -pkg avro -o bla.go -logical-type uuid,uuid.UUID,github.com/google/uuid in.avsc

If the type you are mapping to is a built-in Go type (e.g., string, int, etc.), you can omit the import path element in the mapping definition:

avrogen -pkg avro -o bla.go -logical-type date,int32 in.avsc

If you intend to use multiple custom logical type mappings, you can specify the -logicaltype flag multiple times.

Avro schema validation

avrosv

A small Avro schema validation command-line utility is also available. This simple tool leverages the schema parsing functionality of the library, showing validation errors or optionally dumping parsed schemas to the console. It can be used in CI/CD pipelines to validate schema changes in a repository.

Install the Avro schema validator with:

go install github.com/iskorotkov/avro/v2/cmd/avrosv@<version>

Example usage assuming there's a valid schema in in.avsc (exit status code is 0):

avrosv in.avsc

An invalid schema will result in a diagnostic output and a non-zero exit status code:

avrosv bad-default-schema.avsc; echo $?
Error: avro: invalid default for field someString. <nil> not a string
2

Schemas referencing other schemas can also be validated by providing all of them (schemas are parsed in order):

avrosv base-schema.avsc schema-withref.avsc

Check the options and usage with -h:

avrosv -h

Name Validation

Avro names are validated according to the Avro specification.

However, the official Java library does not validate said names accordingly, resulting to some files out in the wild to have invalid names. Thus, this library has a configuration option to allow for these invalid names to be parsed.

avro.SkipNameValidation = true

Note that this variable is global, so ideally you'd need to unset it after you're done with the invalid schema.

Go Version Support

This library supports the last two versions of Go. While the minimum Go version is not guaranteed to increase along side Go, it may jump from time to time to support additional features. This will be not be considered a breaking change.

Who uses this library?

Create a GitHub Issue if you use this library and want to be included in this list!

About

A fast Go Avro codec

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • Go 99.8%
  • Other 0.2%