-
Princeton University
- Princeton, NJ
-
04:02
(UTC -05:00)
Stars
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Code and documentation to train Stanford's Alpaca models, and generate the data.
DSPy: The framework for programming—not prompting—language models
High accuracy RAG for answering questions from scientific documents with citations
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
My learning notes/codes for ML SYS.
A live stream development of RL tunning for LLM agents
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
A curated list for Efficient Large Language Models
Official Implementation of Rectified Flow (ICLR2023 Spotlight)
[ACL 2021] LM-BFF: Better Few-shot Fine-tuning of Language Models https://arxiv.org/abs/2012.15723
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
Flash-Muon: An Efficient Implementation of Muon Optimizer
Efficient, Flexible, and Highly Fault-Tolerant Model Service Management Based on SGLang
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"