v0.27.0: Graph functions; Improved Go backend (fusion ops); Quantization dtypes; more ML layers & fixes #367
janpfeifer
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
v0.27.0: Graph functions; Improved Go backend (fusion ops); Quantization dtypes; more ML layers & fixes
backends: major refactoring to add support for functions/closures.backends.Function, which now holds all the "ops" methods.NewFunction,ClosureandCall.backends.Op->backends.Value.FusedOps, allowing backends to expose fused (more efficient) operations -- with proper/automaticfallback to decomposed operations when not supported or for gradients.
ErrNotImplementederror andIsNotImplemented(err)function.Quantizationstruct,QuantizationScheme(Linear, NF4), andNF4LookupTable.Dot()operation (redundant withDotGeneral).DotGeneral()now takes aDotGeneralConfigstruct, with options for setting the accumulator and output dtypes.simplego:Float16support (thx @timkaye11)packgemmsupport leveraging simd operations (@ajroetker, @janpfeifer)Reverseoperation.FusedGelu,FusedDense,FusedSoftmax,FusedLayerNorm,FusedScaledDotProductAttention,FusedAttentionQKVProjection.FusedQuantizedDense: fused dequantization + matmul + bias + activation for Int4/Int8 weightswith Linear and NF4 quantization schemes, block-wise scales, and optional zero points.
FusedScaledDotProductAttention: addedScaledDotProductAttentionConfigoptions struct withQuantizedMatmulsflag for optional uint8 quantized Q@K/attn@V matmuls (awaiting go-highwayrelease for actual acceleration).
Bitcastrefactored to pure bit reinterpretation; sub-byte unpacking moved toConvertDType.bucketing:dtypes:graph:Functionconcept (and support for closures) and theFunction.Calloperation.WhileandIfoperations.Sort,SortFunc,TopK,BottomK.Bitcastfor packed sub-byte types:Int4,Int2,Uint4andUint2, so they can be "bitcast"back and forth from/to
uint8(bytes), to ease quantization.Atan2function.Gathervalidation ofindexVectorAxisto check againststartIndicesrank instead ofoperandrank.Execgraph compilation is now concurrent, avoiding redundant compilations for the same graph shape.ml/layers/attention: ImprovedMultiHeadAttention; AddedKVCachesupport.UseQKVProjection()option for fused Q/K/V Dense projection.FusedScaledDotProductAttentionnow supports boolean masks.ml/layers/attention/pos: AddedPositionalEncoderinterface, and "RoPE" (Rotary Positional Encoding) implementation.ml/models/transformers:Transformer"model": a collection of transformer layers are setup based on given configuration.ml/decode:Added a
Decoderobject to generate text given a sequential model.ml/decode/sample:decodepackage.ml/layers/activations:HardSwish.examples:gpt2: A simple GPT-2 implementation using the new transformers and decode packages. It downloads the model from HuggingFace.textgen: a minimal transformer text generation model that can be trained.gemma3: A simple Gemma 3 implementation using theonnx-gomlxpackage to convert the model, andgo-huggingfaceto download the model and run the tokenizer.mxbai-rerank: A cross-encoder reranking example using the MixedBread Reranker v1. It uses theonnx-gomlxpackage to convert the model, andgo-huggingfaceto download the model and run the tokenizer.BERT-base-NER: A BERT-base model fine-tuned for Named Entity Recognition.This discussion was created from the release v0.27.0: Graph functions; Improved Go backend (fusion ops); Quantization dtypes; more ML layers & fixes.
Beta Was this translation helpful? Give feedback.
All reactions