Tags: trnsci/trnsolver
Tags
v0.9.0: svd factorization (A = U diag(s) Vh) Add svd(A, full_matrices=False) to trnsolver.factor. Thin wrapper around torch.linalg.svd with the standard _to_fp32/_restore dtype-promotion pattern; default full_matrices=False returns the economy decomposition matching the convention used internally by pinv. Closes the svd item in the Factorizations row of the API table.
v0.8.0: SSOR preconditioner for SPD systems (#28) Add ssor_preconditioner(A, omega=1.0) to trnsolver.iterative. Applies M^{-1} r via forward triangular solve (D + ωL) t = r, diagonal scaling v = ω(2-ω) diag(A) ⊙ t, and backward solve (D + ωL^T) z = v. ω=1 is symmetric Gauss-Seidel; converges faster than Jacobi on coupled matrices (1D Laplacian, FEM stiffness). BF16/FP16 promoted to FP32 in factory and closure. SSOR benchmark added to bench_solver.py. Closes SSOR item in #28.
v0.7.0: BF16/FP16 dtype support across the full public API (#19) Add _to_fp32 / _restore promotion shim in factor.py; apply to all 9 factor functions, eigh + eigh_generalized in eigen.py, cg + gmres + block_jacobi_preconditioner in iterative.py. Inputs in BF16/FP16 are upcast to FP32 at the public API boundary, computed via the existing FP32 path, and downcast before return. FP32/FP64 pass through unchanged. New tests/test_dtype.py validates round-trip dtype preservation and numerical agreement with FP32 reference for every covered entry point. Closes #19.
chore(release): v0.6.0 — eigh subspace rotation refinement + solve_sp… …d iterative refinement eigh: replace scalar Rayleigh-quotient pass with one Rayleigh-Ritz step (V^T A V, re-diagonalize via eigh(H), rotate V). Reduces eigenvector residuals 1–2 orders of magnitude for n ≥ 64. No API change. Closes #31. solve_spd: add iterative_refinement=False keyword. When True, computes residual in FP64 (mixed-precision) and applies a second Cholesky correction pass. Reliable for cond(A) up to ~1e7. Backward-compatible. Closes #32.
chore(release): v0.5.0 — block-Jacobi preconditioner, pinv, Phase 2 p… …recision items Closes #16 (block_jacobi_preconditioner; IC0 skipped for dense A; SSOR deferred to v0.6.0). Closes #22 (pinv via truncated SVD; schur deferred to Phase 3). Closes #14, #25 (Newton-Schulz trnblas.gemm in inv_sqrt_spd_ns). Closes #27 (FP64 CG/GMRES inner products; Rayleigh-quotient refinement in eigh).
chore(release): v0.4.1 — eigh_generalized NKI path + #36 investigation eigh_generalized now uses trnblas.trsm for triangular solves (closes #11). matvec_kernel Tensor Engine attempt documented and reverted — nc_matmul requires free_dim >> 1 for the moving operand; (n,1) vector is out of range on both simulator and hardware (closes #36 investigation). 14/14 hardware tests pass at 36 s on trn1.2xlarge.
chore(release): v0.4.0 — Householder-QR eigh on NKI, hardware validated - Householder tridiagonalization + implicit-shift QR on NKI path (#38) - NKI 0.3.0 namespace migration (neuronxcc.nki.* → nki.*) - torch_xla.sync() barrier fixes per-step recompilation and NCC_IDEL901 - 14/14 @pytest.mark.neuron tests pass on trn1.2xlarge in 41 s - Closes #12, #26
chore(release): v0.3.0 Ships: inv_sqrt_spd_ns Newton-Schulz API, scipy LAPACK baselines, CUDA cuSOLVER bench suite with vintage-matched terraform module and SSM runner, docs rewrite. Also bundles the CI/docs-deploy cleanup and pyproject normalization that landed since v0.2.0. Milestones renumbered: old v0.3.0 (hardware) -> v0.4.0, old v0.4.0 (production) -> v0.5.0. This v0.3.0 tag corresponds to the CPU-side API and infra additions.