v0.27.0: Graph functions; Improved Go backend (fusion ops); Quantization dtypes; more ML layers & fixes #367

janpfeifer · 2026-03-13T07:10:30Z

janpfeifer
Mar 13, 2026
Maintainer

v0.27.0: Graph functions; Improved Go backend (fusion ops); Quantization dtypes; more ML layers & fixes

Package backends: major refactoring to add support for functions/closures.
- Added backends.Function, which now holds all the "ops" methods.
- Added NewFunction, Closure and Call.
- Renamed backends.Op -> backends.Value.
- Added FusedOps, allowing backends to expose fused (more efficient) operations -- with proper/automatic
  fallback to decomposed operations when not supported or for gradients.
- Added ErrNotImplemented error and IsNotImplemented(err) function.
- Added Quantization struct, QuantizationScheme (Linear, NF4), and NF4LookupTable.
- Removed Dot() operation (redundant with DotGeneral).
- DotGeneral() now takes a DotGeneralConfig struct, with options for setting the accumulator and output dtypes.
Package simplego:
- Added Float16 support (thx @timkaye11)
- Added dedup of computation nodes (aka. "common subexpression elimination" CSE) (thx @timkaye11, @janpfeifer)
  - ~6% speedup for CSI-Adult demo training.
- DotGeneral: Pre-blocking of the blocked path, which may lead to deduplication of blocking nodes (@timekaye11).
- DotGeneral: Added smallMatMul execution path, optimized for small matrix multiplications (thx @timkaye11)
- Experimental packgemm support leveraging simd operations (@ajroetker, @janpfeifer)
- Funtions/closures support (thx @ajroetker)
- Added Reverse operation.
- Added fused operations: FusedGelu, FusedDense, FusedSoftmax, FusedLayerNorm,
  FusedScaledDotProductAttention, FusedAttentionQKVProjection.
- Added FusedQuantizedDense: fused dequantization + matmul + bias + activation for Int4/Int8 weights
  with Linear and NF4 quantization schemes, block-wise scales, and optional zero points.
- FusedScaledDotProductAttention: added ScaledDotProductAttentionConfig options struct with
  QuantizedMatmuls flag for optional uint8 quantized Q@K/attn@V matmuls (awaiting go-highway
  release for actual acceleration).
- Bitcast refactored to pure bit reinterpretation; sub-byte unpacking moved to ConvertDType.
New package bucketing:
- Tools to manage bucketing of tensors (or anything else) -- thx @ajroetker
Package dtypes:
- Added 'Uint2', 'Uint4', 'Int2', 'Int4'.
Package graph:
- Added Function concept (and support for closures) and the Function.Call operation.
- Control Flow: Added While and If operations.
- Order operations: Added Sort, SortFunc, TopK, BottomK.
- Fixed Bitcast for packed sub-byte types: Int4, Int2, Uint4 and Uint2, so they can be "bitcast"
  back and forth from/to uint8 (bytes), to ease quantization.
- Added Atan2 function.
- Added test helper functions to test various backends at once.
- Fixed Gather validation of indexVectorAxis to check against startIndices rank instead of operand rank.
- Exec graph compilation is now concurrent, avoiding redundant compilations for the same graph shape.
Package ml/layers/attention: Improved MultiHeadAttention; Added KVCache support.
- Added Grouped Query Attention (GQA) support.
- Added UseQKVProjection() option for fused Q/K/V Dense projection.
- FusedScaledDotProductAttention now supports boolean masks.
Package ml/layers/attention/pos: Added PositionalEncoder interface, and "RoPE" (Rotary Positional Encoding) implementation.
Package ml/models/transformers:
- Added a Transformer "model": a collection of transformer layers are setup based on given configuration.
Package ml/decode:
Added a Decoder object to generate text given a sequential model.
Package ml/decode/sample:
- Added implementation of various sampling strategies (greedy, temperature, beam-search, top-k, top-p, etc.), used by the decode package.
Package ml/layers/activations:
- Added HardSwish.
Package examples:
- Separated in its own sub-module, to separate its dependencies.
- Added gpt2: A simple GPT-2 implementation using the new transformers and decode packages. It downloads the model from HuggingFace.
- Added textgen: a minimal transformer text generation model that can be trained.
- Added gemma3: A simple Gemma 3 implementation using the onnx-gomlx package to convert the model, and go-huggingface to download the model and run the tokenizer.
- Added mxbai-rerank: A cross-encoder reranking example using the MixedBread Reranker v1. It uses the onnx-gomlx package to convert the model, and go-huggingface to download the model and run the tokenizer.
- Added BERT-base-NER: A BERT-base model fine-tuned for Named Entity Recognition.
Bumped github actions versions to the new "Node24" ones.

This discussion was created from the release v0.27.0: Graph functions; Improved Go backend (fusion ops); Quantization dtypes; more ML layers & fixes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.27.0: Graph functions; Improved Go backend (fusion ops); Quantization dtypes; more ML layers & fixes #367

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

v0.27.0: Graph functions; Improved Go backend (fusion ops); Quantization dtypes; more ML layers & fixes #367

Uh oh!

janpfeifer Mar 13, 2026 Maintainer

v0.27.0: Graph functions; Improved Go backend (fusion ops); Quantization dtypes; more ML layers & fixes

Replies: 0 comments

janpfeifer
Mar 13, 2026
Maintainer