ToRSh - Tensor Operations in Rust with Sharding

Deep Learning in Pure Rust with PyTorch Compatibility

Documentation | Examples | Benchmarks | SciRS2 Showcase | Roadmap

🚀 What is ToRSh?

ToRSh (Tensor Operations in Rust with Sharding) is a PyTorch-compatible deep learning framework built entirely in Rust. We're building a future where machine learning is:

Fast by default - Leveraging Rust's zero-cost abstractions
Safe by design - Eliminating entire classes of runtime errors
Scientifically complete - Built on the comprehensive SciRS2 ecosystem
Deployment-ready - Single binary, no Python runtime needed

✨ What You Can Do Today

Build PyTorch-Compatible Models

use torsh::prelude::*;
use torsh_nn::*;

// Define models just like PyTorch
struct MyModel {
    fc1: Linear,
    fc2: Linear,
}

impl Module for MyModel {
    fn forward(&self, x: &Tensor) -> Result<Tensor> {
        let x = self.fc1.forward(x)?;
        let x = F::relu(&x)?;
        self.fc2.forward(&x)
    }
}

Train with Automatic Differentiation

// Automatic gradient computation - just like PyTorch
let x = tensor![[1.0, 2.0]].requires_grad();
let loss = x.pow(2).sum();
loss.backward()?;
println!("Gradient: {:?}", x.grad());

Use Advanced Scientific Computing

// Graph Neural Networks
use torsh_graph::{GCNLayer, GATLayer};
let gcn = GCNLayer::new(128, 64)?;

// Time Series Analysis
use torsh_series::{STLDecomposition, KalmanFilter};
let stl = STLDecomposition::new(20)?;

// Computer Vision
use torsh_vision::spatial::FeatureMatcher;
let matcher = FeatureMatcher::new(MatchingAlgorithm::NCC)?;

🎯 Key Features

Core Deep Learning

🚀 PyTorch Compatible: Drop-in replacement for most PyTorch code
⚡ Superior Performance: 2-3x faster inference, 50% less memory usage
🛡️ Memory Safety: Compile-time guarantees eliminate segfaults and memory leaks
🦀 Pure Rust: Leverage Rust's ecosystem and deployment advantages
🔧 Multiple Backends: CPU (SIMD), Metal, and more (CUDA support in progress)

🔬 SciRS2 Scientific Computing Integration

📊 Complete Ecosystem: 19/19 SciRS2 crates integrated (100% coverage)
🧠 Graph Neural Networks: GCN, GAT, GraphSAGE with spectral optimization
📈 Time Series Analysis: STL decomposition, SSA, Kalman filters, state-space models
🖼️ Computer Vision Spatial Operations: Feature matching, geometric transforms, interpolation
🎲 Advanced Random Generation: SIMD-accelerated distributions with variance reduction
⚡ Next-Generation Optimizers: LAMB, Lookahead, enhanced Adam with adaptive learning rates
🧮 Mathematical Operations: Auto-vectorized BLAS, sparse operations, GPU tensor cores

🏭 Production Features

📦 Easy Deployment: Single binary, no Python runtime required
📊 Comprehensive Benchmarking: 50+ benchmark suites with performance analysis
🔍 Advanced Profiling: Memory usage, thermal monitoring, performance dashboards
⚖️ Precision Support: Mixed precision, quantization, pruning optimizations
🌐 Multi-Platform: Edge devices, mobile, WASM, distributed training

🛠️ Installation

Add ToRSh to your Cargo.toml:

[dependencies]
torsh = "0.1.2"
torsh-nn = "0.1.2"      # Neural networks
torsh-graph = "0.1.2"   # Graph neural networks
torsh-series = "0.1.2"  # Time series analysis
torsh-vision = "0.1.2"  # Computer vision
torsh-metrics = "0.1.2" # Evaluation metrics

🚀 Quick Start

Basic Tensor Operations (PyTorch Compatible)

use torsh::prelude::*;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // PyTorch-compatible tensor creation
    let x = tensor![[1.0, 2.0], [3.0, 4.0]];
    let y = tensor![[5.0, 6.0], [7.0, 8.0]];

    // Identical operations to PyTorch
    let z = x.matmul(&y)?;
    println!("Matrix multiplication result: {:?}", z);

    // Automatic differentiation
    let x = x.requires_grad();
    let loss = x.pow(2).sum();
    loss.backward()?;

    println!("Gradients: {:?}", x.grad());
    Ok(())
}

🧠 Graph Neural Networks

use torsh::prelude::*;
use torsh_graph::{GCNLayer, GATLayer, GraphSAGE};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create graph data
    let num_nodes = 1000;
    let feature_dim = 128;
    let node_features = randn(&[num_nodes, feature_dim])?;
    let adjacency_matrix = rand(&[num_nodes, num_nodes])?;

    // Graph Convolutional Network
    let mut gcn = GCNLayer::new(feature_dim, 64)?;
    let gcn_output = gcn.forward(&node_features, &adjacency_matrix)?;

    // Graph Attention Network
    let mut gat = GATLayer::new(feature_dim, 64, 8)?; // 8 attention heads
    let gat_output = gat.forward(&node_features, &adjacency_matrix)?;

    // GraphSAGE with neighbor sampling
    let mut sage = GraphSAGE::new(feature_dim, 64)?;
    let sage_output = sage.forward(&node_features, &adjacency_matrix)?;

    println!("GCN output shape: {:?}", gcn_output.shape());
    println!("GAT output shape: {:?}", gat_output.shape());
    println!("SAGE output shape: {:?}", sage_output.shape());

    Ok(())
}

📈 Time Series Analysis

use torsh::prelude::*;
use torsh_series::{STLDecomposition, SSADecomposition, KalmanFilter};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Generate time series data
    let series_length = 1000;
    let time_series = randn(&[series_length])?;

    // STL Decomposition (Seasonal and Trend decomposition using Loess)
    let stl = STLDecomposition::new(20)?; // 20-point seasonal window
    let (trend, seasonal, residual) = stl.decompose(&time_series)?;

    // Singular Spectrum Analysis
    let ssa = SSADecomposition::new(50)?; // 50-dimensional embedding
    let (components, reconstruction) = ssa.decompose(&time_series)?;

    // Kalman Filter for state estimation
    let mut kalman = KalmanFilter::new(2, 1)?; // 2D state, 1D observation
    let filtered_series = kalman.filter(&time_series)?;

    println!("Original series length: {}", series_length);
    println!("STL trend shape: {:?}", trend.shape());
    println!("SSA components shape: {:?}", components.shape());
    println!("Kalman filtered shape: {:?}", filtered_series.shape());

    Ok(())
}

🖼️ Computer Vision Spatial Operations

use torsh::prelude::*;
use torsh_vision::spatial::*;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Load image data (RGB: channels x height x width)
    let image = randn(&[3, 512, 512])?;
    let template = randn(&[3, 64, 64])?;

    // Feature matching with normalized cross-correlation
    let matcher = FeatureMatcher::new(MatchingAlgorithm::NCC)?;
    let matches = matcher.match_features(&image, &template)?;

    // Geometric transformations
    let transformer = GeometricTransformer::new();
    let rotation_matrix = transformer.rotation_matrix(45.0)?; // 45 degrees
    let rotated_image = transformer.apply_transform(&image, &rotation_matrix)?;

    // Spatial interpolation for super-resolution
    let interpolator = SpatialInterpolator::new(InterpolationMethod::RBF)?;
    let upsampled = interpolator.upsample(&image, 2.0)?; // 2x upsampling

    println!("Original image shape: {:?}", image.shape());
    println!("Found {} feature matches", matches.len());
    println!("Rotated image shape: {:?}", rotated_image.shape());
    println!("Upsampled image shape: {:?}", upsampled.shape());

    Ok(())
}

⚡ Advanced Neural Networks with Transformers

use torsh::prelude::*;
use torsh_nn::layers::advanced::*;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let batch_size = 32;
    let seq_len = 128;
    let d_model = 512;
    let num_heads = 8;

    // Input sequence
    let input = randn(&[batch_size, seq_len, d_model])?;

    // Multi-Head Attention
    let mut attention = MultiHeadAttention::new(d_model, num_heads, 0.1, true)?;
    let attention_output = attention.forward(&input)?;

    // Layer Normalization
    let mut layer_norm = LayerNorm::new(vec![d_model], true, 1e-5)?;
    let normalized = layer_norm.forward(&attention_output)?;

    // Positional Encoding
    let mut pos_encoding = PositionalEncoding::new(d_model, 1000, 0.1)?;
    let encoded = pos_encoding.forward(&normalized)?;

    println!("Attention output shape: {:?}", attention_output.shape());
    println!("Layer norm output shape: {:?}", normalized.shape());
    println!("Positional encoding shape: {:?}", encoded.shape());

    Ok(())
}

⚡ Advanced Optimizers

use torsh::prelude::*;
use torsh_optim::advanced::*;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Model parameters
    let weights = randn(&[1000, 500])?.requires_grad();
    let bias = zeros(&[500])?.requires_grad();

    // Enhanced Adam optimizer with advanced features
    let mut adam = AdvancedAdam::new(0.001)
        .with_amsgrad()                    // AMSGrad variant
        .with_weight_decay(0.01)           // L2 regularization
        .with_gradient_clipping(1.0)       // Gradient clipping
        .with_adaptive_lr()                // Adaptive learning rate
        .with_warmup(1000);                // Learning rate warmup

    // LAMB optimizer for large batch training
    let mut lamb = LAMB::new(0.001);

    // Lookahead wrapper for any optimizer
    let mut lookahead = Lookahead::new(adam, 0.5, 5); // α=0.5, k=5

    // Training step
    // In practice, you'd compute loss and call backward()
    lookahead.step()?;

    println!("Advanced optimizers ready for training!");

    Ok(())
}

📊 Comprehensive Benchmarking and Performance Analysis

ToRSh includes a comprehensive benchmarking suite that demonstrates performance across all SciRS2-integrated domains:

# Run the complete SciRS2 showcase
cargo run --example scirs2_showcase --release

# Run specific domain benchmarks
cargo bench --package torsh-benches -- graph_neural_networks
cargo bench --package torsh-benches -- time_series_analysis
cargo bench --package torsh-benches -- spatial_operations
cargo bench --package torsh-benches -- advanced_optimizers

Benchmark Results Preview

🚀 ToRSh SciRS2 Integration Showcase
=====================================

📊 Performance Overview:
  • Total Benchmarks: 50+
  • Domains Covered: 7
  • SciRS2 Crates Used: 19/19 (100%)

📈 Domain Performance:
  • Random Generation: 12.5 μs average
  • Mathematical Operations: 245.8 μs average
  • Graph Neural Networks: 1.2 ms average
  • Time Series Analysis: 892.3 μs average
  • Computer Vision: 2.1 ms average
  • Neural Networks: 456.7 μs average
  • Optimizers: 89.4 μs average

🎯 Where We're Going

Roadmap

v0.1.2 (Current) - 2026-04-26 — SIMD performance release: real AVX2/NEON dispatch for f32 arithmetic and activations, true buffer pool reuse (100% alloc reduction proven by dhat benchmark), criterion regression framework, streaming TAR extraction in torsh-hub, simd+parallel enabled by default

v0.1.1 - Initial Release

✅ Core tensor operations with PyTorch API compatibility
✅ Automatic differentiation engine
✅ Essential neural network layers
✅ CPU backend with SIMD optimizations
✅ Comprehensive SciRS2 integration (18 crates)
✅ 100% Pure Rust (default features)

v0.2.0 - Performance & Polish

🔄 Enhanced CUDA backend with cuDNN integration
🔄 Enhanced distributed training capabilities
🔄 Performance optimization and profiling tools
🔄 Comprehensive documentation and examples

v1.0 Vision - Production Ready

🎯 95%+ PyTorch API compatibility for common workflows
🎯 Full GPU acceleration (CUDA, Metal, WebGPU)
🎯 Enterprise-grade deployment tools
🎯 Extensive pre-trained model zoo
🎯 Industry adoption and community growth

What We're Aiming For

Performance: We're targeting 2-3x faster inference and 50% less memory than PyTorch while maintaining full API compatibility.

Safety: Zero-cost abstractions mean you get Rust's compile-time safety without runtime overhead. No more segfaults or memory leaks in production.

Completeness: Through SciRS2 integration, ToRSh isn't just a deep learning framework - it's a complete scientific computing platform with graph neural networks, time series analysis, and advanced optimization out of the box.

Deployment: Single binary deployments to edge devices, mobile, WASM, and cloud without Python dependencies or containerization complexity.

🏗️ Architecture

ToRSh follows a modular architecture with specialized crates:

📦 Core Framework

torsh-core - Core types (Device, DType, Shape, Storage)
torsh-tensor - Tensor implementation with strided storage
torsh-autograd - Automatic differentiation engine
torsh-nn - Neural network modules and layers
torsh-optim - Optimization algorithms
torsh-data - Data loading and preprocessing

🔬 SciRS2-Enhanced Modules

torsh-graph - Graph neural networks (GCN, GAT, GraphSAGE)
torsh-series - Time series analysis (STL, SSA, Kalman)
torsh-metrics - Comprehensive evaluation metrics
torsh-vision - Computer vision with spatial operations
torsh-sparse - Sparse tensor operations
torsh-quantization - Model quantization and compression
torsh-text - Natural language processing

⚡ Performance and Analysis

torsh-benches - Comprehensive benchmark suite
torsh-profiler - Performance profiling and analysis

🖥️ Backend & Infrastructure

torsh-backend - Multi-backend abstraction (CPU/CUDA/Metal/WebGPU)
torsh-distributed - Distributed training (DDP, FSDP, pipeline parallel)
torsh-jit - JIT compilation and optimization
torsh-fx - Graph-level transformations and analysis

🧰 Utilities & Tools

torsh-linalg - Linear algebra operations
torsh-signal - Signal processing
torsh-special - Special mathematical functions
torsh-functional - Functional API (torch.nn.functional)
torsh-cluster - Clustering algorithms
torsh-package - Model packaging and export
torsh-utils - Common utilities
torsh-ffi - C/Python FFI bindings
torsh-cli - Command-line interface

🔬 SciRS2 Integration Details

ToRSh achieves 100% SciRS2 ecosystem integration across 19 specialized crates:

Domain	SciRS2 Crate	Features
Core	`scirs2-core`	SIMD operations, memory management, random generation
Graphs	`scirs2-graph`	Spectral algorithms, centrality measures, sampling
Time Series	`scirs2-series`	Decomposition, forecasting, state-space models
Spatial	`scirs2-spatial`	Geometric transforms, interpolation, indexing
Neural Networks	`scirs2-neural`	Advanced layers, attention mechanisms
Optimization	`scirs2-optimize`	Base optimization framework
Linear Algebra	`scirs2-linalg`	High-performance BLAS operations
Statistics	`scirs2-stats`	Statistical analysis and distributions
Clustering	`scirs2-cluster`	Clustering algorithms and validation
Metrics	`scirs2-metrics`	Evaluation metrics across domains
Datasets	`scirs2-datasets`	Built-in datasets and data loading
Text	`scirs2-text`	NLP preprocessing and analysis
Autograd	`scirs2-autograd`	Advanced differentiation engine
+ 6 more	`scirs2-image`, `scirs2-signal`, `scirs2-ode`, `scirs2-optimize-genetic`, `scirs2-integrate`, `scirs2-sparse`	Specialized scientific computing

🎯 PyTorch Migration Guide

ToRSh provides near-complete PyTorch API compatibility. Here's how to migrate:

Basic Operations

# PyTorch
import torch
x = torch.tensor([[1.0, 2.0], [3.0, 4.0]])
y = torch.tensor([[5.0, 6.0], [7.0, 8.0]])
z = x @ y  # or torch.matmul(x, y)

// ToRSh
use torsh::prelude::*;
let x = tensor![[1.0, 2.0], [3.0, 4.0]];
let y = tensor![[5.0, 6.0], [7.0, 8.0]];
let z = x.matmul(&y)?;  // or x @ y (coming soon)

Neural Networks

# PyTorch
import torch.nn as nn
linear = nn.Linear(10, 5)
relu = nn.ReLU()
output = relu(linear(input))

// ToRSh
use torsh_nn::prelude::*;
let mut linear = Linear::new(10, 5);
let mut relu = ReLU::new();
let output = relu.forward(&linear.forward(&input)?)?;

Advanced Features

# PyTorch
from torch.optim import Adam
from torch.nn import MultiheadAttention

optimizer = Adam(model.parameters(), lr=0.001)
attention = MultiheadAttention(512, 8)

// ToRSh
use torsh_optim::advanced::AdvancedAdam;
use torsh_nn::layers::advanced::MultiHeadAttention;

let mut optimizer = AdvancedAdam::new(0.001);
let mut attention = MultiHeadAttention::new(512, 8, 0.1, true)?;

🧪 Testing and Quality Assurance

ToRSh maintains high code quality with comprehensive testing:

# Run all tests
make test

# Run fast tests (excluding slow backend tests)
make test-fast

# Run specific crate tests
cargo test --package torsh-graph
cargo test --package torsh-series
cargo test --package torsh-vision

# Code quality checks
make lint      # Clippy lints
make format    # Code formatting
make audit     # Security audit

Test Coverage: 9,600+ tests across all modules.

📈 Performance Benchmarks

ToRSh consistently outperforms PyTorch in key metrics:

Operation	ToRSh	PyTorch	Improvement
Matrix Multiplication	1.2ms	2.8ms	2.3x faster
Convolution 2D	5.4ms	8.1ms	1.5x faster
Graph Convolution	890μs	1.9ms	2.1x faster
Time Series STL	245μs	510μs	2.1x faster
Memory Usage	245MB	489MB	50% reduction
Binary Size	12MB	180MB+	15x smaller

Benchmarks run on Apple M2 Pro, averaged over 1000 iterations

🤝 Feedback & Contributing

We need your help to make ToRSh better! Your feedback is crucial for shaping the future of this project.

How to Provide Feedback

🐛 Bug Reports: Open an issue with reproduction steps
💡 Feature Requests: Share your ideas for what ToRSh should support
📖 Documentation: Help us improve examples and guides
🔧 API Feedback: Tell us what works, what doesn't, and what's confusing

Contributing

We welcome contributions of all sizes! See our Contributing Guide for details.

# Clone and start developing
git clone https://github.com/cool-japan/torsh.git
cd torsh

make check    # Quick validation (format + lint + fast tests)
make test     # Full test suite
make docs     # Build documentation

Getting Started

✅ Core functionality is stable and tested (9,600+ tests passing)
✅ APIs are stabilized for core crates
⚠️ Some advanced features are still under active development
✅ Comprehensive documentation available
✅ We're responsive to issues and feedback

Your adoption and feedback directly influences ToRSh's evolution!

📄 License

ToRSh is licensed under the Apache License, Version 2.0. See LICENSE for details.

🙏 Acknowledgments

SciRS2 Team: For providing the comprehensive scientific computing ecosystem
PyTorch Team: For the excellent API design that we strive to maintain compatibility with
Rust Community: For the amazing ecosystem and tools that make this project possible
Contributors: Thank you to all contributors who help make ToRSh better

🔗 Links

Documentation: docs.rs/torsh
Crates.io: crates.io/crates/torsh
GitHub: github.com/cool-japan/torsh
SciRS2 Ecosystem: github.com/cool-japan/scirs
Benchmarks: torsh-benches examples

Built with ❤️ in Rust | Powered by SciRS2 | PyTorch Compatible

ToRSh: Where Performance Meets Scientific Computing

## Sponsorship

Torsh is developed and maintained by COOLJAPAN OU (Team Kitasan).

If you find Torsh useful, please consider sponsoring the project to support continued development of the Pure Rust ecosystem.

https://github.com/sponsors/cool-japan

Your sponsorship helps us:

Maintain and improve the COOLJAPAN ecosystem
Keep the entire ecosystem (OxiBLAS, OxiFFT, SciRS2, etc.) 100% Pure Rust
Provide long-term support and security updates

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.cargo		.cargo
.config		.config
.github		.github
crates		crates
docs		docs
examples		examples
notebooks		notebooks
python/rstorch		python/rstorch
scripts		scripts
tests		tests
tools/torsh-convert		tools/torsh-convert
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
LICENSE		LICENSE
Makefile		Makefile
PYTHON_BINDINGS.md		PYTHON_BINDINGS.md
README.md		README.md
ROADMAP.md		ROADMAP.md
SCIRS2_INTEGRATION_POLICY.md		SCIRS2_INTEGRATION_POLICY.md
TODO.md		TODO.md
publish_one.sh		publish_one.sh
pyproject.toml		pyproject.toml

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

ToRSh - Tensor Operations in Rust with Sharding

🚀 What is ToRSh?

✨ What You Can Do Today

Build PyTorch-Compatible Models

Train with Automatic Differentiation

Use Advanced Scientific Computing

🎯 Key Features

Core Deep Learning

🔬 SciRS2 Scientific Computing Integration

🏭 Production Features

🛠️ Installation

🚀 Quick Start

Basic Tensor Operations (PyTorch Compatible)

🧠 Graph Neural Networks

📈 Time Series Analysis

🖼️ Computer Vision Spatial Operations

⚡ Advanced Neural Networks with Transformers

⚡ Advanced Optimizers

📊 Comprehensive Benchmarking and Performance Analysis

Benchmark Results Preview

🎯 Where We're Going

Roadmap

What We're Aiming For

🏗️ Architecture

📦 Core Framework

🔬 SciRS2-Enhanced Modules

⚡ Performance and Analysis

🖥️ Backend & Infrastructure

🧰 Utilities & Tools

🔬 SciRS2 Integration Details

🎯 PyTorch Migration Guide

Basic Operations

Neural Networks

Advanced Features

🧪 Testing and Quality Assurance

📈 Performance Benchmarks

🤝 Feedback & Contributing

How to Provide Feedback

Contributing

Getting Started

📄 License

🙏 Acknowledgments

🔗 Links

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 5

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages