Tyr

A dependently-typed deep learning framework for Lean 4, providing compile-time tensor shape verification. Very much WIP.

Overview

Tyr uses Lean 4's dependent type system to catch tensor dimension mismatches at compile time, not runtime. This eliminates a major source of bugs while providing access to optimized tensor operations.

-- Shapes are tracked in the type system
def linear {m n b : UInt64} (x : T #[b, m]) (M : T #[n, m]) : T #[b, n] := ...

-- Mismatched dimensions fail at compile time, not runtime!
let x : T #[32, 768] := ...
let w : T #[768, 512] := ...
let y := linear x w  -- Error: expected T #[n, 768], got T #[768, 512]

Dependencies

Lean 4

Install elan (the Lean version manager):

curl https://raw.githubusercontent.com/leanprover/elan/master/elan-init.sh -sSf | sh

The correct Lean nightly is pinned in lean-toolchain and will be installed automatically on first lake build.

LibTorch (required)

macOS:

cd external
LIBTORCH_VERSION=2.10.0
curl --fail --location --retry 5 --retry-all-errors --show-error \
  -o libtorch.zip "https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-${LIBTORCH_VERSION}.zip"
unzip -tq libtorch.zip
unzip -q libtorch.zip && rm libtorch.zip
cd ..

Or run the helper script:

bash dependencies_macos.sh

Linux (CPU):

cd external
LIBTORCH_VERSION=2.10.0
curl --fail --location --retry 5 --retry-all-errors --show-error \
  -o libtorch.zip "https://download.pytorch.org/libtorch/cpu/libtorch-shared-with-deps-${LIBTORCH_VERSION}%2Bcpu.zip"
unzip -tq libtorch.zip
unzip -q libtorch.zip && rm libtorch.zip
cd ..

Linux (CUDA 12.6):

cd external
curl -O https://download.pytorch.org/libtorch/nightly/cu126/libtorch-cxx11-abi-shared-with-deps-latest.zip
unzip libtorch-cxx11-abi-shared-with-deps-latest.zip && rm libtorch-cxx11-abi-shared-with-deps-latest.zip
cd ..

OpenMP (required)

macOS (Homebrew):

brew install libomp

Linux: OpenMP is typically included with GCC. Install if needed:

sudo apt install libomp-dev   # Debian/Ubuntu

Apache Arrow & Parquet (required for data loading)

macOS:

brew install apache-arrow

Linux:

sudo apt install libarrow-dev libparquet-dev

C++17 Compiler (required)

macOS: Xcode command line tools (xcode-select --install)
Linux: GCC 9+ or Clang 10+ (sudo apt install build-essential)

Quick Start

Building

# Build all targets with Lake
lake build

# Build specific executables
lake build test_runner
lake build TrainGPT
lake build TrainDiffusion
lake build TrainNanoChat
lake build FluxDemo

Environment Setup

All executables need the native library paths set at runtime:

macOS (Apple Silicon):

export DYLD_LIBRARY_PATH=external/libtorch/lib:/opt/homebrew/opt/libomp/lib:/opt/homebrew/lib

macOS (Intel):

export DYLD_LIBRARY_PATH=external/libtorch/lib:/usr/local/opt/libomp/lib:/usr/local/lib

Linux:

export LD_LIBRARY_PATH=external/libtorch/lib:/usr/lib

Or use the Lake helper scripts which set these automatically:

lake run           # runs test_runner
lake run train     # runs TrainGPT

Running Tests

lake build test_runner
.lake/build/bin/test_runner

# Or use the helper script
lake run

# Experimental/in-progress suites
lake build test_runner_experimental
.lake/build/bin/test_runner_experimental

Examples

See Examples/README.md for detailed per-example documentation.

Example	Description	Build target
TrainGPT	Character-level GPT on Shakespeare	`lake build TrainGPT`
TrainDiffusion	Discrete masked diffusion on ASCII text	`lake build TrainDiffusion`
TrainNanoChat	Modded-nanogpt distributed training	`lake build TrainNanoChat`
FluxDemo	Flux Klein 4B image generation	`lake build FluxDemo`
BranchingFlows	Combinatorial branching flow sampler	Part of `Examples` lib
NanoProof	Transformer theorem prover (model only)	Part of `Examples` lib

Distributed NanoChat (GPU Node)

Use the helper scripts to run TrainNanoChat under torchrun without pulling in a mismatched CUDA module stack:

# default: debug smoke run on 2 GPUs
./scripts/nanochat/run_train_torchrun.sh

# explicit 4-GPU run
NPROC_PER_NODE=4 ./scripts/nanochat/run_train_torchrun.sh \
  --debug --iterations 2 --data data/nanochat --val data/nanochat

# scaling check (1/2/4 GPUs by default)
./scripts/nanochat/bench_distributed.sh

Notes:

run_train_torchrun.sh defaults TORCHRUN_BIN to /grid/it/data/elzar/easybuild/software/Anaconda3/2023.07-2/bin/torchrun.
Override launcher path with TORCHRUN_BIN=/path/to/torchrun.
Override process counts in the benchmark script with SIZES="2 4" (or any space-separated list).

GPU Kernel Parity

The ThunderKittens-style GPU coverage now has a reusable end-to-end parity path centered on seeded fixture generation plus hardware-backed validation:

# Run the current GPU parity suite
./scripts/gpu/test_parity_suite.sh

# Add randomized MHA trials on top of the deterministic suite
RANDOMIZED_MHA_TRIALS=10 ./scripts/gpu/test_parity_suite.sh

Notes:

The suite currently covers copy, rotary, layernorm, flashattn, and mha_h100 entrypoints through the executables declared in lakefile.lean.
PyTorch is the default numerical oracle for fixture generation and parity.
If you have a local vendored ThunderKittens reference runner, set TYR_GPU_VENDORED_REF_RUNNER=/path/to/runner. It will be called as runner <suite-name> <fixture-dir> after each suite and should exit nonzero on mismatch.
This checkout may have an empty thirdparty/ThunderKittens directory; the vendored hook is optional and exists so parity can start paying off before the submodule/runtime reference path is fully wired.

Key Concepts

Shape-Indexed Tensors

The core data type is T s - a tensor type indexed by its shape:

-- T is parameterized by shape (Array UInt64)
def T (s : Shape) : Type := TSpec.type

-- Shape mismatches are compile-time errors
def matmul {a b c : UInt64} (x : T #[a, b]) (y : T #[b, c]) : T #[a, c] := ...

TensorStruct Typeclass

Generic traversal over structures containing tensors:

class TensorStruct (α : Type) where
  map     : (∀ {s}, T s → T s) → α → α
  mapM    : (∀ {s}, T s → m (T s)) → α → m α
  zipWith : (∀ {s}, T s → T s → T s) → α → α → α
  fold    : (∀ {s}, T s → β → β) → β → α → β

Use Vector n α instead of Array α for type-safe zipWith operations.

Data Loading

Two patterns for different use cases:

-- Fixed dimensions (type-safe):
let iter := SequentialBatchIterator.new loader 8 256
let (batch, iter') := iter.next  -- Returns T #[8, 256]

-- Dynamic dimensions:
let iter := BatchIterator.new shard 8 256
let (batch, iter') ← iter.next   -- Returns T #[] (erased)

SafeTensors Type Provider

Tyr includes a Lean command-level type provider for SafeTensors schemas:

import Tyr.SafeTensors

open torch

-- Works with a single .safetensors file or a sharded directory.
-- If `model.safetensors.index.json` exists, introspection follows `weight_map`.
safetensors_type_provider "/path/to/model_dir_or_file" as ModelWeights

def inspectWeights : IO Unit := do
  IO.println s!"discovered tensors: {ModelWeights.tensorCount}"

  -- Per-tensor typed loader + typed schema metadata
  let tokEmbed ← ModelWeights.load_model_embed_tokens_weight
  IO.println s!"embed dtype: {ModelWeights.model_embed_tokens_weightSpec.dtype}"
  IO.println s!"embed shape: {tokEmbed.runtimeShape}"

  -- Hierarchical aggregate generated from tensor names
  let weights ← ModelWeights.loadAll
  let qProj := weights.model.layers[0]!.self_attn.q_proj.weight
  IO.println s!"q_proj shape: {qProj.runtimeShape}"

  -- Hierarchical subtree loaders are also generated
  let decoder ← ModelWeights.model.load
  IO.println s!"loaded subtree"

Notes:

Tensor schema dtype metadata uses the core torch.DType type (not raw strings).
For sharded checkpoints, unsafe index shard paths (absolute paths or .. traversal) are rejected.

FFI Reference Counting

The C++ bindings use careful reference counting. See cc/src/tyr.cpp header for details:

borrowTensor(): Shared ownership, auto-cleanup
giveTensor(): Transfer ownership to Lean
lean_dec(): Required after extracting from lean_obj_arg, not for b_lean_obj_arg

Monitor tensor leaks via get_live_tensors which tracks outstanding C++ tensors.

Development

Adding New Tensor Operations

Add Lean declaration in Tyr/Torch.lean with @[extern "lean_torch_xxx"]
Implement in cc/src/tyr.cpp following reference counting conventions
Rebuild: lake build

Project Structure

lakefile.lean - Lake build configuration
lean-toolchain - Lean version specification
cc/ - C++ FFI bindings (LibTorch wrapper)
Tyr/ - Core framework (tensors, modules, optimizers, distributed)
Examples/ - Training scripts and model implementations
Tests/ - Test suites

Commit Template

This repo uses scoped conventional commit subjects:

type(scope): summary

A commit message template is included at .gitmessage. Enable it locally:

./scripts/setup-git-hooks.sh

This sets:

commit.template=.gitmessage
core.hooksPath=.githooks

Included hooks:

pre-commit: fails on staged whitespace errors and conflict markers
commit-msg: enforces type(scope): summary (e.g. feat(qwen35): add video stream patchify)
pre-push: validates pushed commit subjects with scripts/check-commit-messages.sh

CI also enforces commit subjects using scripts/check-commit-messages.sh.

Manual check examples:

./scripts/check-commit-messages.sh HEAD~20..HEAD
COMMIT_MSG_ENFORCE_FROM=<commit> ./scripts/check-commit-messages.sh HEAD~20..HEAD

License

TBD.

Name		Name	Last commit message	Last commit date
Latest commit History 637 Commits
.githooks		.githooks
.github/workflows		.github/workflows
.vscode		.vscode
Examples		Examples
Tests		Tests
Tyr		Tyr
benchmarks		benchmarks
cc		cc
dev		dev
docs/gpu		docs/gpu
external		external
scripts		scripts
thirdparty		thirdparty
.bazelrc		.bazelrc
.gitignore		.gitignore
.gitmessage		.gitmessage
.gitmodules		.gitmodules
Examples.lean		Examples.lean
LICENSE		LICENSE
Readme.md		Readme.md
Tests.lean		Tests.lean
TestsExperimental.lean		TestsExperimental.lean
Tyr.lean		Tyr.lean
dependencies_macos.sh		dependencies_macos.sh
lake-manifest.json		lake-manifest.json
lakefile.lean		lakefile.lean
lean-toolchain		lean-toolchain
load_modules.sh		load_modules.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tyr

Overview

Dependencies

Lean 4

LibTorch (required)

OpenMP (required)

Apache Arrow & Parquet (required for data loading)

C++17 Compiler (required)

Quick Start

Building

Environment Setup

Running Tests

Examples

Distributed NanoChat (GPU Node)

GPU Kernel Parity

Key Concepts

Shape-Indexed Tensors

TensorStruct Typeclass

Data Loading

SafeTensors Type Provider

FFI Reference Counting

Development

Adding New Tensor Operations

Project Structure

Commit Template

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Tyr

Overview

Dependencies

Lean 4

LibTorch (required)

OpenMP (required)

Apache Arrow & Parquet (required for data loading)

C++17 Compiler (required)

Quick Start

Building

Environment Setup

Running Tests

Examples

Distributed NanoChat (GPU Node)

GPU Kernel Parity

Key Concepts

Shape-Indexed Tensors

TensorStruct Typeclass

Data Loading

SafeTensors Type Provider

FFI Reference Counting

Development

Adding New Tensor Operations

Project Structure

Commit Template

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages