Skip to content
View gfvvz's full-sized avatar
🎯
Focusing
🎯
Focusing
  • Shanghai, China

Block or report gfvvz

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.

Go 46,858 4,132 Updated Jun 14, 2026

Port of Nvidia LocateAnything-3B on ggml

C++ 58 4 Updated Jun 12, 2026

sparkrun - launch, manage, and stop LLM inference workloads on NVIDIA DGX Spark systems

Python 329 29 Updated Jun 13, 2026

GPUVerify: a Verifier for GPU Kernels

C# 82 18 Updated Jul 28, 2022

Vim-fork focused on extensibility and usability

Vim Script 100,396 6,919 Updated Jun 15, 2026

Build a ChatGPT like LLM from scratch in PyTorch, explained step by step.

Jupyter Notebook 315 60 Updated Jun 8, 2026

Learn LLM internals step by step - from tokenization to attention to inference optimization.

1,071 94 Updated Jun 14, 2026

An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode.

Python 36 4 Updated Jun 8, 2026

AI 圆桌 - Multi-AI Roundtable Chrome Extension

JavaScript 265 66 Updated Jun 10, 2026

SGLang NVFP4 (fp4_e2m1) KV cache for Blackwell SM120 (RTX PRO 6000): FlashInfer FA2 kernel patches + native FP4 pool + hybrid-SWA wiring + per-layer global-scale auto-calibration. 1.778x KV capacit…

Python 2 Updated Jun 7, 2026

Modern RL Post-training Infrastructure: Optimized for NVIDIA/AMD GPUs with a focus on vLLM and DeepSpeed integration, CUDA/ROCm/Triton kernels, and transparent hardware-aware scaling.

Python 123 20 Updated Jun 14, 2026

a fast, scalable, multi-language and extensible build system

Java 25,508 4,509 Updated Jun 13, 2026
Python 205 45 Updated Jul 9, 2022

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

Python 18,690 2,989 Updated Apr 14, 2026

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 893 150 Updated Sep 26, 2025

Random program generation based on semantic reification (PLDI'26)

C++ 17 Updated Jun 11, 2026

Public repository for Agent Skills

Python 150,715 17,795 Updated Jun 9, 2026

Sub2API is an open-source relay platform that unifies Claude, OpenAI, Gemini, and Antigravity subscriptions into a single endpoint. It supports account sharing and cost-sharing, with seamless nativ…

Go 27,721 5,621 Updated Jun 14, 2026

Kimi Code CLI — The Starting Point for Next-Gen Agents

TypeScript 2,390 273 Updated Jun 14, 2026

A collection of DESIGN.md files analysis by popular brand design systems. Drop one into your project and let coding agents generate a matching UI.

90,219 10,725 Updated Jun 8, 2026

The official Lark/Feishu CLI tool, maintained by the larksuite team — built for humans and AI Agents. Covers core business domains including Messenger, Docs, Base, Sheets, Calendar, Mail, Tasks, Me…

Go 14,083 965 Updated Jun 14, 2026

Conveniently export torch.compile compiled products into self-contained Python files

Python 32 2 Updated Jun 5, 2026

An Optimizer for Nvidia Compilers.

Python 92 4 Updated Jun 12, 2026

Interactive World Model papers organized by core research challenges.

Python 230 8 Updated Jun 11, 2026

Fully autonomous & self-evolving research from idea to paper. Chat an Idea. Get a Paper. 🦞

Python 13,409 1,571 Updated Jun 3, 2026

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Python 1,934 123 Updated Feb 20, 2026
Next