Skip to content

Tags: trnsci/trnsolver

Tags

v0.9.0

Toggle v0.9.0's commit message

Unverified

This user has not yet uploaded their public signing key.
v0.9.0: svd factorization (A = U diag(s) Vh)

Add svd(A, full_matrices=False) to trnsolver.factor. Thin wrapper around
torch.linalg.svd with the standard _to_fp32/_restore dtype-promotion
pattern; default full_matrices=False returns the economy decomposition
matching the convention used internally by pinv. Closes the svd item in
the Factorizations row of the API table.

v0.8.0

Toggle v0.8.0's commit message

Unverified

This user has not yet uploaded their public signing key.
v0.8.0: SSOR preconditioner for SPD systems (#28)

Add ssor_preconditioner(A, omega=1.0) to trnsolver.iterative. Applies
M^{-1} r via forward triangular solve (D + ωL) t = r, diagonal scaling
v = ω(2-ω) diag(A) ⊙ t, and backward solve (D + ωL^T) z = v. ω=1 is
symmetric Gauss-Seidel; converges faster than Jacobi on coupled matrices
(1D Laplacian, FEM stiffness). BF16/FP16 promoted to FP32 in factory and
closure. SSOR benchmark added to bench_solver.py. Closes SSOR item in #28.

v0.7.0

Toggle v0.7.0's commit message

Unverified

This user has not yet uploaded their public signing key.
v0.7.0: BF16/FP16 dtype support across the full public API (#19)

Add _to_fp32 / _restore promotion shim in factor.py; apply to all 9
factor functions, eigh + eigh_generalized in eigen.py, cg + gmres +
block_jacobi_preconditioner in iterative.py. Inputs in BF16/FP16 are
upcast to FP32 at the public API boundary, computed via the existing
FP32 path, and downcast before return. FP32/FP64 pass through unchanged.
New tests/test_dtype.py validates round-trip dtype preservation and
numerical agreement with FP32 reference for every covered entry point.
Closes #19.

v0.6.0

Toggle v0.6.0's commit message

Unverified

This user has not yet uploaded their public signing key.
chore(release): v0.6.0 — eigh subspace rotation refinement + solve_sp…

…d iterative refinement

eigh: replace scalar Rayleigh-quotient pass with one Rayleigh-Ritz step
(V^T A V, re-diagonalize via eigh(H), rotate V). Reduces eigenvector
residuals 1–2 orders of magnitude for n ≥ 64. No API change. Closes #31.

solve_spd: add iterative_refinement=False keyword. When True, computes
residual in FP64 (mixed-precision) and applies a second Cholesky
correction pass. Reliable for cond(A) up to ~1e7. Backward-compatible.
Closes #32.

v0.5.0

Toggle v0.5.0's commit message

Unverified

This user has not yet uploaded their public signing key.
chore(release): v0.5.0 — block-Jacobi preconditioner, pinv, Phase 2 p…

…recision items

Closes #16 (block_jacobi_preconditioner; IC0 skipped for dense A; SSOR deferred to v0.6.0).
Closes #22 (pinv via truncated SVD; schur deferred to Phase 3).
Closes #14, #25 (Newton-Schulz trnblas.gemm in inv_sqrt_spd_ns).
Closes #27 (FP64 CG/GMRES inner products; Rayleigh-quotient refinement in eigh).

v0.4.1

Toggle v0.4.1's commit message

Unverified

This user has not yet uploaded their public signing key.
chore(release): v0.4.1 — eigh_generalized NKI path + #36 investigation

eigh_generalized now uses trnblas.trsm for triangular solves (closes #11).
matvec_kernel Tensor Engine attempt documented and reverted — nc_matmul
requires free_dim >> 1 for the moving operand; (n,1) vector is out of
range on both simulator and hardware (closes #36 investigation).
14/14 hardware tests pass at 36 s on trn1.2xlarge.

v0.4.0

Toggle v0.4.0's commit message

Unverified

This user has not yet uploaded their public signing key.
chore(release): v0.4.0 — Householder-QR eigh on NKI, hardware validated

- Householder tridiagonalization + implicit-shift QR on NKI path (#38)
- NKI 0.3.0 namespace migration (neuronxcc.nki.* → nki.*)
- torch_xla.sync() barrier fixes per-step recompilation and NCC_IDEL901
- 14/14 @pytest.mark.neuron tests pass on trn1.2xlarge in 41 s
- Closes #12, #26

v0.3.0

Toggle v0.3.0's commit message

Unverified

This user has not yet uploaded their public signing key.
chore(release): v0.3.0

Ships: inv_sqrt_spd_ns Newton-Schulz API, scipy LAPACK baselines,
CUDA cuSOLVER bench suite with vintage-matched terraform module and
SSM runner, docs rewrite. Also bundles the CI/docs-deploy cleanup
and pyproject normalization that landed since v0.2.0.

Milestones renumbered: old v0.3.0 (hardware) -> v0.4.0, old v0.4.0
(production) -> v0.5.0. This v0.3.0 tag corresponds to the CPU-side
API and infra additions.

v0.2.0

Toggle v0.2.0's commit message

Unverified

This user has not yet uploaded their public signing key.
chore(release): v0.2.0

Ships the v0.2.0 milestone: benchmark suite, README badges, Jacobi
preconditioner preview, manual Neuron hardware CI workflow, and repo
transfer to trnsci org. Closes #1, #2, #4, #5, #6, #7, #8.

v0.1.1

Toggle v0.1.1's commit message

Unverified

This user has not yet uploaded their public signing key.
chore(release): v0.1.1 — align neuronxcc/torch-neuronx floors with tr…

…nsci suite