Skip to content

Releases: zerfoo/zerfoo

v1.48.0

13 Apr 13:29

Choose a tag to compare

Features

  • crossasset: extract to feza-ai/wolf
  • parity: add MIMOMambaBlock and HModule structural parity tests

The crossasset/ package has been moved to github.com/feza-ai/wolf/crossasset.

v1.47.0

13 Apr 13:29

Choose a tag to compare

Features

  • crossasset: convert model from float64 to float32
  • crossasset: replace forward slice math with Engine[T] ops and wire TrainGPU
  • parity: add GPU parity test Containerfile and Spark manifest
  • parity: add GPU vs CPU parity tests for activations, normalization, and RoPE
  • parity: add GPU vs CPU parity tests for core ops, attention, and backward

Bug Fixes

  • attention: recompute attention weights in SDPA backward after flash forward
  • crossasset: relax GPU parity test tolerance for flash attention divergence
  • cuda: add /opt/zerfoo/lib to libkernels.so dlopen search path

v1.46.0

12 Apr 03:07

Choose a tag to compare

1.46.0 (2026-04-12)

Features

  • parity: add BlockAttnRes golden parity test (11e17fc)
  • parity: add CfC golden-file parity test (3490dac)
  • parity: add FreTS golden-file parity test (b0c63c8)
  • parity: add GRN golden-file parity test (18c6964)
  • parity: add TimeMixer golden-file parity test (d3b5ebe)
  • parity: upgrade PatchTST, N-BEATS, ITransformer to golden-file parity (92249e4)
  • parity: wire MambaBlock golden parity test (fdea55b)
  • timeseries: migrate CfC to Engine[T] compliance (ff971af)
  • timeseries: migrate DLinear and TimeMixer to Engine[T] compliance (d37ee31)
  • timeseries: migrate FreTS to Engine[T] compliance (37ece62)
  • timeseries: migrate ITransformer to Engine[T] (e8f4e0a)

v1.45.0

11 Apr 08:12

Choose a tag to compare

1.45.0 (2026-04-11)

Features

  • parity: add 11 more layer parity tests (E86.1 remaining) (462f24e)
  • parity: add 22 new layer parity tests (E86.1 + E86.3) (092e3bf)
  • parity: add GQA and MoE golden-file parity tests (2e043e9)
  • parity: add PyTorch golden file parity tests for 32 layers (6200355)
  • parity: Wave 2 - backward parity + model architectures (E86.2, E86.4) (9351180)
  • parity: wire Go tests for 10 existing golden files (E86.0) (e8b9815)

Bug Fixes

  • core: add missing transposes in MatMul backward (fefdcba)
  • loss: add 2/N scaling factor to MSE backward for mean reduction (54be887)
  • loss: add batch normalization to CrossEntropy backward (7733c08)
  • normalization: correct ReduceSum axis in LayerNorm backward (5c300c9)

v1.44.0

11 Apr 01:21

Choose a tag to compare

1.44.0 (2026-04-11)

Features

  • crossasset: add Save/Load for trained model weights (c1e7ab1), closes #378

v1.43.0

10 Apr 23:45

Choose a tag to compare

1.43.0 (2026-04-10)

Features

  • bench: add bench-spark.sh helper for Spark submission (0321a18)
  • bench: add PatchTST training benchmark tool (d847238)
  • bench: add Spark pod manifest for PatchTST training (0e05d43)
  • timeseries: activate fused encoder forward path (8aa526d)
  • timeseries: add weight-hash debug helper for GPU training diagnosis (c5a34c5)
  • timeseries: wire fused encoder kernel into PatchTST training (bafdad0)

Bug Fixes

  • bench: mount /opt/zerfoo/lib so libkernels.so is reachable (aa6331a)
  • bench: post YAML (not JSON) and parse Spark status shape (9d20746)
  • ci: make govulncheck non-blocking for unfixed bbolt vuln (b6b38a6)
  • mlstm: use paper's stabilized exponential-gating formulation (46b7b86)
  • slstm: use paper's stabilized exponential-gating formulation (e47e4a4)
  • timeseries: compare Storage identity in gradTs sentinel (a67063a)
  • timeseries: GPU training convergence — rebuild paramTs/gradTs per batch, strengthen sentinel, remove dead machinery (168a938)
  • timeseries: GPU training writes back optimizer step to device (f29c93b)
  • timeseries: skip flaky TimeMixer gradient check + add WithTimeMixerRNG (4f96d99)
  • timeseries: use return value of GPU Reshape in PatchTST backward (d61cbab)

Performance Improvements

  • timeseries: pre-allocate PatchTST GPU train loop buffers (E85 T85.2.1-3,5) (09a318c)

v1.42.1

06 Apr 23:07

Choose a tag to compare

1.42.1 (2026-04-06)

Bug Fixes

  • modeldsl: replace .Data() bias loop with engine.Add (4fd8d63)

v1.42.0

05 Apr 09:08

Choose a tag to compare

1.42.0 (2026-04-05)

Features

  • inference: add builder_helpers with newTensorLookup and newParamWrapper (adfb334)

Bug Fixes

  • generate: remove unused compute import after merge (b7511c3)

v1.41.0

04 Apr 01:15

Choose a tag to compare

1.41.0 (2026-04-04)

Features

  • cmd: add --pjrt flag for PJRT backend selection (66fb945)
  • crossasset: replace SGD with AdamW in CPU Train() (#315) (4d6664c)
  • functional: add GELUBackward for gradient computation (0e89305)
  • functional: add LayerNormBackward for gradient computation (1e51b9e)
  • functional: add LinearBackward for gradient computation (534127d)
  • functional: add MLPBackward for 2-layer MLP gradient computation (8624a1e)
  • functional: add MultiHeadAttentionBackward (2d91fa3)
  • functional: add SoftmaxBackward for gradient computation (1c2c486)
  • generate: wire PJRTPlan into decode loop (ca6bab6)
  • inference: add PJRT compilation path (9cde667)
  • layers: add functional activation wrappers (GELU, Softmax, ReLU, SiLU, Sigmoid) (962b36d)
  • layers: add functional LayerNorm and RMSNorm wrappers (08c7ac9)
  • layers: add functional Linear and MultiHeadAttention wrappers (e5449e8)

Bug Fixes

  • architecture: add crossasset/backward.go to privateLayer allowlist (5c01ccf)
  • architecture: add layernorm_ops.go backward to dataAbuse allowlist (34fe067)
  • crossasset: call Train() once with all epochs to preserve AdamW state (834b8f3)
  • crossasset: delegate TrainGPU to CPU full-backprop with AdamW (#317) (b345932)
  • crossasset: snapshot GPU tensors to CPU before backward reads (#317) (4de925e)
  • timeseries: resolve warmupLR merge conflict with scheduler.WarmupLR (9f573cf)
  • timeseries: update nhits_test weight shape check for transposed layout (f090509)
  • training: fix QuantileLoss generic type assertions (a282e9d)

Performance Improvements

  • training: replace guardAndClipGradients .Data() loops with Engine ops (92e1218)
  • training: replace SGD broadcast allocation with engine.MulScalar (aad4deb)

v1.40.1

02 Apr 19:11

Choose a tag to compare

1.40.1 (2026-04-02)

Bug Fixes

  • crossasset: prevent CUDA illegal memory access in TrainGPU backward (#317) (8db043e)