Skip to content
View JF-D's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report JF-D

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Infiniband Verbs Performance Tests

C 858 353 Updated Oct 27, 2025

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 606 51 Updated Nov 4, 2025

My learning notes/codes for ML SYS.

Python 4,066 247 Updated Oct 6, 2025

Checkpoint-engine is a simple middleware to update model weights in LLM inference engines

Python 805 61 Updated Nov 4, 2025

ScreenCoder — Turn any UI screenshot into clean, editable HTML/CSS with full control. Fast, accurate, and easy to customize.

Python 2,469 232 Updated Oct 22, 2025

An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models

Python 2,195 136 Updated Nov 5, 2025

[arXiv 2025] MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence

Python 55 Updated Oct 23, 2025

Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models

Python 158 6 Updated Oct 10, 2025

DeepSeek-V3/R1 inference performance simulator

Jupyter Notebook 169 23 Updated Mar 27, 2025

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,441 957 Updated Oct 24, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 3,849 298 Updated Nov 5, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,861 736 Updated Oct 15, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,691 972 Updated Nov 5, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,842 896 Updated Sep 30, 2025

MLGym A New Framework and Benchmark for Advancing AI Research Agents

Python 566 55 Updated Aug 10, 2025

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 16,043 3,177 Updated Nov 5, 2025

A bibliography and survey of the papers surrounding o1

TeX 1,208 51 Updated Nov 16, 2024

[NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank

Python 62 15 Updated Nov 4, 2024

FlexFlow Serve: Low-Latency, High-Performance LLM Serving

C++ 63 5 Updated Sep 15, 2025

A unified inference and post-training framework for accelerated video generation.

Python 2,520 192 Updated Nov 5, 2025
Jupyter Notebook 124 12 Updated Nov 11, 2024

[ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training

Python 244 22 Updated Aug 9, 2025

https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching

Python 385 38 Updated Jul 5, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 15,128 2,426 Updated Nov 5, 2025
Python 74 12 Updated Oct 29, 2024

[COLM 2024] SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models

Python 24 3 Updated Oct 5, 2024

[CoRL 2024] VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding

Python 119 1 Updated May 22, 2025
Python 1,348 54 Updated Nov 21, 2024

制作懂人情世故的大语言模型 | 涵盖提示词工程、RAG、Agent、LLM微调教程

Python 1,589 130 Updated Apr 29, 2025

A throughput-oriented high-performance serving framework for LLMs

Jupyter Notebook 910 44 Updated Oct 29, 2025
Next