Skip to content

ecto/tang

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

111 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tang mascot

tang

Differentiable computing in Rust.

use tang::*;
use tang_la::{DVec, DMat, Svd};
use tang_ad::{Tape, grad};
use tang_optim::Adam;

// Dual numbers give you exact derivatives for free
let x = Dual::new(2.0, 1.0);
let y = (x * x).sin(); // y.dual = cos(x²) · 2x

// Reverse-mode AD for large parameter spaces
let loss = |x: &[f64]| x[0]*x[0] + x[1]*x[1] + x[2]*x[2];
let g = grad(loss, &[1.0, 2.0, 3.0]); // [2, 4, 6]

// Dense linear algebra — LU, SVD, Cholesky, QR, Eigen
let a = DMat::from_fn(3, 3, |i, j| if i == j { 2.0 } else { -1.0 });
let svd = Svd::new(&a);

// The same Scalar trait flows through everything
let q = Quat::axis_angle(Dir3::Z, core::f64::consts::FRAC_PI_2);
let v = q.rotate(Vec3::new(1.0, 0.0, 0.0)); // ≈ (0, 1, 0)

Crates

Core math

Crate What it does
tang Vec2/3/4, Mat3/4, Quat, Transform, Dual<S>, spatial algebra — all generic over Scalar
tang-la DVec, DMat, LU, SVD, Cholesky, QR, eigendecomposition — heap-allocated, generic over Scalar
tang-sparse CSR, CSC, COO sparse matrices with SpMV

Differentiation

Crate What it does
tang-ad Reverse-mode autodiff — grad, jacobian, hessian, VJP, JVP
tang-expr Symbolic expression graphs — trace, differentiate, simplify, compile to native closures or WGSL shaders

Compute & arrays

Crate What it does
tang-tensor N-d arrays with broadcasting, reductions, matmul
tang-gpu GPU compute via wgpu — fused kernels from tang-expr, tiled matmul, full training pipeline on Metal/Vulkan/DX12

Optimization & training

Crate What it does
tang-optim SGD, Adam/AdamW, L-BFGS, Newton, Levenberg-Marquardt
tang-train Module trait, layers (Linear → Transformer), loss functions, schedulers, PINN support

Ecosystem

Crate What it does
tang-safetensors Load/save HuggingFace safetensors format (F16, BF16, F32, F64)
tang-hub Download pretrained models + tokenizers from HuggingFace Hub
tang-mesh Distributed compute — ship expression graphs over QUIC, data/pipeline/tensor parallelism
tang-bench Benchmark suite — geometry, LA, autodiff, GPU, vs nalgebra/glam

Architecture

                          tang
                    ┌───────┴───────┐
                 tang-la          tang-expr
              ┌────┼────┐        ┌──┴──┐
        tang-sparse  tang-ad  tang-gpu  │
                        │      ↗       │
                   tang-optim  tang-tensor
                        │     ╱  │      │
                   tang-train    │   tang-mesh
                                 │
                       tang-safetensors
                                 │
                              tang-hub

Arrows point from dependee → dependent. Two independent trees (tang-la for algebra, tang-expr for symbolic computation) converge at tang-tensor and tang-gpu.

Design

One Scalar trait. f32, f64, and Dual<S> all implement Scalar. Write your physics once, get exact derivatives by swapping the type parameter. Forward-mode for small systems, reverse-mode for large ones.

Two paths to GPU. tang-expr traces Rust math into a symbolic DAG, then compiles to WGSL. tang-gpu dispatches the compiled kernels. Element-wise chains fuse into a single kernel automatically.

no_std throughout. Every crate is #![no_std] with alloc. Use tang on embedded, in WASM, wherever.

No heavyweight dependencies. Core types are hand-rolled #[repr(C)] with optional bytemuck and serde support. Dense LA is native Rust, generic over Scalar. An optional faer feature enables world-class f64 performance.

Physics-native ML. The same types that run your constraint solver and physics engine also train your neural nets. tang-tensortang-adtang-optimtang-train is a complete differentiable programming stack.

Distributed by default. tang-mesh ships expression graphs — not tensors — over QUIC. Each worker compiles locally to its own GPU backend. Data-parallel, pipeline-parallel, and tensor-parallel strategies with fault tolerance.

Quick Start

cargo add tang tang-la tang-ad
use tang::{Vec3, Quat, Dual, Scalar};
use tang_la::{DVec, DMat, Lu};
use tang_ad::grad;

nalgebra Compatibility

tang provides drop-in compatibility aliases so you can migrate from nalgebra with minimal call-site changes.

Type mapping

nalgebra tang / tang-la
Vector2<f64> Vec2<f64>
Vector3<f64> Vec3<f64>
Vector4<f64> Vec4<f64>
Point3<f64> Point3<f64>
Unit<Vector3<f64>> Dir3<f64>
Matrix3<f64> Mat3<f64>
Matrix4<f64> Mat4<f64>
UnitQuaternion<f64> Quat<f64>
DVector<f64> DVec<f64>
DMatrix<f64> DMat<f64>

API compatibility

Both by-value and by-reference calling conventions work:

// nalgebra style (by-ref) — works
v.dot(&w);
v.cross(&w);

// tang style (by-value) — also works
v.dot(w);
v.cross(w);

Aliases provided for common nalgebra names:

nalgebra tang equivalent
Vec3::zeros() Vec3::zero() (+ zeros() alias)
v.norm_squared() v.norm_sq() (+ norm_squared() alias)
Unit::new_normalize(v) Dir3::new(v) (+ new_normalize() alias)
dir.as_ref() dir.as_ref() (AsRef + Deref to Vec3)
Mat3::from_diagonal(&v) Mat3::diagonal(v) (+ from_diagonal(&v) alias)
m[(i,j)] Index for Mat3, Mat4, DMat (+ IndexMut for DMat)
DMatrix::identity(n, n) DMat::identity(n)
DVector::from_column_slice(s) DVec::from_slice(s) (+ from_column_slice() alias)
DVector::from_iterator(n, it) DVec::from_iterator(n, it)
DMatrix::from_iterator(r, c, it) DMat::from_iterator(r, c, it)
DMatrix::from_row_slice(r, c, s) DMat::from_row_slice(r, c, s)
m.symmetric_eigen() m.symmetric_eigen() (method on DMat)
m.svd(true, true) m.svd(true, true) (method on DMat)
svd.singular_values svd.s (+ .singular_values() accessor)
svd.v_t svd.vt (+ .v_t() accessor)
a.clone().lu().solve(&b) a.clone().lu().solve(&b) (DMatLu wrapper)
m.try_inverse() m.try_inverse() (DMat, Mat3, Mat4)

Benchmarks

All benchmarks on Apple M-series, single-threaded. Run with cargo bench -p tang-bench.

Geometry & physics primitives (f64 unless noted)

Operation tang nalgebra glam (f32)
vec3 dot 2.2ns 2.2ns 1.6ns
vec3 cross 1.8ns 2.2ns 1.9ns
vec3 normalize 3.7ns 3.5ns 2.5ns
mat3 mul 5.9ns 6.9ns
mat4 mul 11.0ns 12.1ns 5.5ns
mat4 inverse 12.0ns 16.0ns 8.8ns
quat rotate 2.5ns 2.6ns 2.3ns
quat mul 3.3ns 3.3ns 2.0ns
quat slerp 8.1ns 11.0ns 6.3ns

Differentiable physics

tang's key advantage: the same code that runs your physics also gives you exact derivatives. Write your simulation once with generic S: Scalar, then swap in Dual<f64> to get gradients, Jacobians, and Hessians — no finite differences, no truncation error.

Benchmark tang AD finite diff notes
rigid body gradient (6 params) 19ns 18ns exact vs ε-approximate
FK Jacobian (3-link arm, 3×3) 278ns 199ns exact Jacobian, no tuning h
LU solve derivative 81ns 167ns 2x faster — Dual flows through LU
Hessian Rosenbrock (2×2) 77ns 27ns exact 2nd derivatives

The speed comparison is secondary — the real win is that tang's derivatives are exact to machine precision. Finite differences require careful step-size tuning (too large → truncation error, too small → cancellation error) and break down for stiff systems. With tang, you just change the type parameter.

The LU solve benchmark highlights a capability nalgebra cannot match: because tang's decompositions are generic over Scalar, Lu::<Dual<f64>> gives you derivatives of the solution through the linear solve for free.

Dense linear algebra (f64, tang vs nalgebra)

Operation n tang nalgebra ratio
GEMM 32 6.6µs 1.5µs 4.4x
128 250µs 84µs 3.0x
512 20ms 4.9ms 4.1x
LU solve 32 3.6µs 3.2µs 1.1x
128 118µs 107µs 1.1x
512 9.8ms 8.6ms 1.1x
Cholesky solve 32 3.0µs 2.6µs 1.2x
128 170µs 63µs 2.7x
512 28ms 3.7ms 7.6x
QR 32 5.7µs 5.0µs 1.1x
128 360µs 211µs 1.7x
512 31ms 13ms 2.4x
Sym. Eigen 32 66µs 28µs 2.4x
128 4.8ms 942µs 5.1x

tang's dense LA is pure generic Rust (works with Dual<f64>, any Scalar impl). nalgebra dispatches to optimized BLAS/LAPACK-style routines for f64. The gap is expected and acceptable for tang's use case — when you need peak f64 throughput, enable the faer feature.

Autodiff overhead

Operation plain f64 Dual f64 overhead
trig chain 4.5ns 8.2ns 1.8x
vec3 chain 3.7ns 15.4ns 4.2x
quat rotate 15.0ns 30.9ns 2.1x

GPU training pipeline (tang-gpu, Apple M-series Metal)

Operation time
matmul 16x16 124µs
matmul 32x32 124µs
matmul 64x64 127µs
matmul 128x128 127µs
fused elementwise (a+b)² 4096 2.9ms
Linear forward [4,128]→[4,64] 6.2ms
Linear backward [4,128]→[4,64] 3.4ms
Sequential(2→8→1) fwd+bwd 28ms
MSE loss (64 elements) 8.4ms
XOR training step (4 samples) 60ms

Matmul is dispatch-bound at small sizes (constant ~124µs for 16-128). The training step includes forward, loss, backward, and Adam update for a 3-layer network.

Examples

The Quantum Poet

A character-level text generator trained on physics haikus. Demonstrates the full tang-train pipeline — dataset construction, sequential model, cross-entropy loss, Adam optimizer, and text generation.

cargo run --example quantum_poet -p tang-train
=== The Quantum Poet ===

corpus: 1491 chars, 1483 training samples, 8 parameters window
model: 21534 parameters

training...
  epoch   1: loss = 2.9453
  epoch  20: loss = 0.0916
  epoch 200: loss = 0.0207

--- seed: "quantum " ---
quantum fields vibrate below,
dimensions curl up and hide,
theory seeks the truth...

~21K parameters, trains in seconds on CPU. See crates/tang-train/examples/quantum_poet.rs.

Development

cargo test --workspace
cargo test --workspace --all-features
cargo bench -p tang-bench

License

MIT

About

quic mafs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages