Stars
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Productive, portable, and performant GPU programming in Python.
MSVC's implementation of the C++ Standard Library.
Pyodide is a Python distribution for the browser and Node.js based on WebAssembly
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
A high performance and generic framework for distributed DNN training
A General-purpose Task-parallel Programming System using Modern C++
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba. Full multimodal LLM Android App:[MNN-LLM-Android](./apps/Android/MnnLlmChat/READ…
An experimental ahead of time compiler for Relay.
Nodejs extension host for vim & neovim, load extensions like VSCode and host language servers.
"Multi-Level Intermediate Representation" Compiler Infrastructure
Repository for the book "Crafting Interpreters"
An open-source NLP research library, built on PyTorch.
A polyhedral compiler for expressing fast and portable data parallel algorithms
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
A technical report on convolution arithmetic in the context of deep learning
A very simple framework for state-of-the-art Natural Language Processing (NLP)
Bonus materials, exercises, and example projects for our Python tutorials