Lists (1)
Sort Name ascending (A-Z)
Stars
Distribute and run LLMs with a single file.
trholding / llama2.c
Forked from karpathy/llama2.cLlama 2 Everywhere (L2E)
Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4
Library for specialized dense and sparse matrix operations, and deep learning primitives.
Official code for the NeurIPS 2022 paper "Shape, Light, and Material Decomposition from Images using Monte Carlo Rendering and Denoising".
Neural Networks with low bit weights on low end 32 bit microcontrollers such as the CH32V003 RISC-V Microcontroller and others
Fast Hadamard transform in CUDA, with a PyTorch interface
Step by step explanation/tutorial of llama2.c
Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).
Tiny example project to test bitwise vector cosine similarity
cgoxopx / llama2.gl
Forked from karpathy/llama2.cInference Llama 2 in OpenGL Compute Shader
The goal of this project is to utilize hardware optimizations via the CPU and GPU to speed up FFT performance.