Stars
Project to recreate your favourite block game for the Wii (Beta 1.7.3)
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
Distributed Compiler based on Triton for Parallel Systems
QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.
Unofficial description of the CUDA assembly (SASS) instruction sets.
A tool for recompiling Xbox 360 games to native executables.
PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP
Installer from Photoshop CC 2021 to 2022 on linux with a GUI
Nvidia Instruction Set Specification Generator
NVIDIA Linux open GPU with P2P support
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
Development repository for the Triton language and compiler
CUDA Templates and Python DSLs for High-Performance Linear Algebra
StableLM: Stability AI Language Models
ChatLLaMA 📢 Open source implementation for LLaMA-based ChatGPT runnable in a single GPU. 15x faster training process than ChatGPT
Fork of Paper which adds regionised multithreading to the dedicated server.
antimatter15 / alpaca.cpp
Forked from ggml-org/llama.cppLocally run an Instruction-Tuned Chat-Style LLM
Instruct-tune LLaMA on consumer hardware
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RN…
The simplest, fastest repository for training/finetuning medium-sized GPTs.