Highlights
Lists (2)
Sort Name ascending (A-Z)
Stars
This repository contains companion software for the Colfax Research paper "Categorical Foundations for CuTe Layouts".
Efficiently computes derivatives of NumPy code.
stb single-file public domain libraries for C/C++
eXtendable Heterogeneous Energy-Efficient Platform based on RISC-V
SparseTIR: Sparse Tensor Compiler for Deep Learning
A time-series database for high-performance real-time analytics packaged as a Postgres extension
Timeloop performs modeling, mapping and code-generation for tensor algebra workloads on various accelerator architectures.
A Fast and Extensible DRAM Simulator, with built-in support for modeling many different DRAM technologies including DDRx, LPDDRx, GDDRx, WIOx, HBMx, and various academic proposals. Described in the…
TensorDict is a pytorch dedicated tensor container.
Application Binary Interface for the Arm® Architecture
A translator from Intel SSE intrinsics to Arm/Aarch64 NEON implementation
Ramulator 2.0 is a modern, modular, extensible, and fast cycle-accurate DRAM simulator. It provides support for agile implementation and evaluation of new memory system designs (e.g., new DRAM stan…
CUDA Templates and Python DSLs for High-Performance Linear Algebra
A lightweight LLVM python binding for writing JIT compilers
Time series forecasting with PyTorch
resurrected LLVM "C Backend", with improvements
A high-performance, zero-overhead, extensible Python compiler with built-in NumPy support
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
Efficient Triton Kernels for LLM Training
Install PyTorch distributions with computation backend auto-detection
GRU-FCN model for univariate time series classification