Stars
slime is an LLM post-training framework for RL Scaling.
opensource self-hosted sandboxes for ai agent
Minimal reproduction of DeepSeek R1-Zero
Training Large Language Model to Reason in a Continuous Latent Space
A free and strong UCI chess engine
Entropy Based Sampling and Parallel CoT Decoding
Large Language Model Text Generation Inference
The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory"
Reaching LLaMA2 Performance with 0.1M Dollars
YaRN: Efficient Context Window Extension of Large Language Models
Doing simple retrieval from LLM models at various context lengths to measure accuracy
Implementation of paper Data Engineering for Scaling Language Models to 128K Context
[ACL'24 Outstanding] Data and code for L-Eval, a comprehensive long context language models evaluation benchmark
Memory bandwidth efficient sparse tree attention
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
Easy control for Key-Value Constrained Generative LLM Inference(https://arxiv.org/abs/2402.06262)
[ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
FlashInfer: Kernel Library for LLM Serving
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
llama and other large language models on iOS and MacOS offline using GGML library.
Building a quick conversation-based search demo with Lepton AI.
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
ELF: a platform for game research with AlphaGoZero/AlphaZero reimplementation