-
Z.ai
- Beijing, China
- https://www.zhihu.com/people/sbobhuang
Lists (1)
Sort Name ascending (A-Z)
Stars
Research artifacts from Recursive's automated AI research system
Winner 🏆 (Agent-only) MLSys 2026 - FlashInfer AI Kernel Generation Contest for the DeepSeek Sparse Attention (DSA) track with an average speedup of 34.93x
Tile-Based Runtime for Ultra-Low-Latency LLM Inference
Toolkit for Seamlessly Enabling RL Training on Any Agent with Bedrock AgentCore.
LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards
mKernel: fast multi-node, multi-GPU fused kernels
Building the Virtuous Cycle for AI-driven LLM Systems
CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs
The open-source managed agents platform. Turn coding agents into real teammates — assign tasks, track progress, compound skills.
Self-hosted AI workspace with shareable AI teammates, shared conversations, memory, and governed access to plugins, MCP tools, and local devices.
GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations (CVPR 2026 Findings)
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Official implementation of Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer
分享AI Infra知识&代码练习:PyTorch/vLLM/SGLang框架入门⚡️、性能加速🚀、大模型基础🧠、AI软硬件🔧等
cuda-oxide is an experimental Rust-to-CUDA compiler that lets you write (SIMT) GPU kernels in safe(ish), idiomatic Rust. It compiles standard Rust code directly to PTX — no DSLs, no foreign languag…
TokenSpeed is a speed-of-light LLM inference engine.