-
Tencent
- Shenzhen, China
-
20:18
(UTC +08:00)
Stars
Achieve state of the art inference performance with modern accelerators on Kubernetes
Dynamic Memory Management for Serving LLMs without PagedAttention
Persist and reuse KV Cache to speedup your LLM.
DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
Ongoing research training transformer models at scale
HuggingFace conversion and training library for Megatron-based models
TeRM: Extending RDMA-Attached Memory with SSD [FAST'24]
Checkpoint-engine is a simple middleware to update model weights in LLM inference engines
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)
FULL Augment Code, Claude Code, Cluely, CodeBuddy, Comet, Cursor, Devin AI, Junie, Kiro, Leap.new, Lovable, Manus, NotionAI, Orchids.app, Perplexity, Poke, Qoder, Replit, Same.dev, Trae, Traycer AI…
基于多智能体LLM的中文金融交易框架 - TradingAgents中文增强版
Speed-up over 50% in average vs traditional memcpy in gcc 4.9 or vc2012
This is a plugin which lets EC2 developers use libfabric as network provider while running NCCL applications.
Efficient GPU communication over multiple NICs.
Documentation of NVIDIA chip/hardware interfaces
[NSDI25] AutoCCL: Automated Collective Communication Tuning for Accelerating Distributed and Parallel DNN Training
AI-based command line tool to quickly generate standardized commit messages.
SGLang is a fast serving framework for large language models and vision language models.