Releases: zerfoo/zerfoo
Releases · zerfoo/zerfoo
v1.48.0
v1.47.0
Features
- crossasset: convert model from float64 to float32
- crossasset: replace forward slice math with Engine[T] ops and wire TrainGPU
- parity: add GPU parity test Containerfile and Spark manifest
- parity: add GPU vs CPU parity tests for activations, normalization, and RoPE
- parity: add GPU vs CPU parity tests for core ops, attention, and backward
Bug Fixes
- attention: recompute attention weights in SDPA backward after flash forward
- crossasset: relax GPU parity test tolerance for flash attention divergence
- cuda: add /opt/zerfoo/lib to libkernels.so dlopen search path
v1.46.0
1.46.0 (2026-04-12)
Features
- parity: add BlockAttnRes golden parity test (11e17fc)
- parity: add CfC golden-file parity test (3490dac)
- parity: add FreTS golden-file parity test (b0c63c8)
- parity: add GRN golden-file parity test (18c6964)
- parity: add TimeMixer golden-file parity test (d3b5ebe)
- parity: upgrade PatchTST, N-BEATS, ITransformer to golden-file parity (92249e4)
- parity: wire MambaBlock golden parity test (fdea55b)
- timeseries: migrate CfC to Engine[T] compliance (ff971af)
- timeseries: migrate DLinear and TimeMixer to Engine[T] compliance (d37ee31)
- timeseries: migrate FreTS to Engine[T] compliance (37ece62)
- timeseries: migrate ITransformer to Engine[T] (e8f4e0a)
v1.45.0
1.45.0 (2026-04-11)
Features
- parity: add 11 more layer parity tests (E86.1 remaining) (462f24e)
- parity: add 22 new layer parity tests (E86.1 + E86.3) (092e3bf)
- parity: add GQA and MoE golden-file parity tests (2e043e9)
- parity: add PyTorch golden file parity tests for 32 layers (6200355)
- parity: Wave 2 - backward parity + model architectures (E86.2, E86.4) (9351180)
- parity: wire Go tests for 10 existing golden files (E86.0) (e8b9815)
Bug Fixes
v1.44.0
v1.43.0
1.43.0 (2026-04-10)
Features
- bench: add bench-spark.sh helper for Spark submission (0321a18)
- bench: add PatchTST training benchmark tool (d847238)
- bench: add Spark pod manifest for PatchTST training (0e05d43)
- timeseries: activate fused encoder forward path (8aa526d)
- timeseries: add weight-hash debug helper for GPU training diagnosis (c5a34c5)
- timeseries: wire fused encoder kernel into PatchTST training (bafdad0)
Bug Fixes
- bench: mount /opt/zerfoo/lib so libkernels.so is reachable (aa6331a)
- bench: post YAML (not JSON) and parse Spark status shape (9d20746)
- ci: make govulncheck non-blocking for unfixed bbolt vuln (b6b38a6)
- mlstm: use paper's stabilized exponential-gating formulation (46b7b86)
- slstm: use paper's stabilized exponential-gating formulation (e47e4a4)
- timeseries: compare Storage identity in gradTs sentinel (a67063a)
- timeseries: GPU training convergence — rebuild paramTs/gradTs per batch, strengthen sentinel, remove dead machinery (168a938)
- timeseries: GPU training writes back optimizer step to device (f29c93b)
- timeseries: skip flaky TimeMixer gradient check + add WithTimeMixerRNG (4f96d99)
- timeseries: use return value of GPU Reshape in PatchTST backward (d61cbab)
Performance Improvements
- timeseries: pre-allocate PatchTST GPU train loop buffers (E85 T85.2.1-3,5) (09a318c)
v1.42.1
v1.42.0
v1.41.0
1.41.0 (2026-04-04)
Features
- cmd: add --pjrt flag for PJRT backend selection (66fb945)
- crossasset: replace SGD with AdamW in CPU Train() (#315) (4d6664c)
- functional: add GELUBackward for gradient computation (0e89305)
- functional: add LayerNormBackward for gradient computation (1e51b9e)
- functional: add LinearBackward for gradient computation (534127d)
- functional: add MLPBackward for 2-layer MLP gradient computation (8624a1e)
- functional: add MultiHeadAttentionBackward (2d91fa3)
- functional: add SoftmaxBackward for gradient computation (1c2c486)
- generate: wire PJRTPlan into decode loop (ca6bab6)
- inference: add PJRT compilation path (9cde667)
- layers: add functional activation wrappers (GELU, Softmax, ReLU, SiLU, Sigmoid) (962b36d)
- layers: add functional LayerNorm and RMSNorm wrappers (08c7ac9)
- layers: add functional Linear and MultiHeadAttention wrappers (e5449e8)
Bug Fixes
- architecture: add crossasset/backward.go to privateLayer allowlist (5c01ccf)
- architecture: add layernorm_ops.go backward to dataAbuse allowlist (34fe067)
- crossasset: call Train() once with all epochs to preserve AdamW state (834b8f3)
- crossasset: delegate TrainGPU to CPU full-backprop with AdamW (#317) (b345932)
- crossasset: snapshot GPU tensors to CPU before backward reads (#317) (4de925e)
- timeseries: resolve warmupLR merge conflict with scheduler.WarmupLR (9f573cf)
- timeseries: update nhits_test weight shape check for transposed layout (f090509)
- training: fix QuantileLoss generic type assertions (a282e9d)