-
Harbin Institute of Technology
- Shenzhen Guangdong, China
Lists (1)
Sort Name ascending (A-Z)
Starred repositories
Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)
LvLLM is a special NUMA extension of vllm that makes full use of CPU and memory resources, reduces GPU memory requirements, and features an efficient GPU parallel and NUMA parallel architecture, su…
FlagTree is a unified compiler supporting multiple AI chip backends for custom Deep Learning operations, which is forked from triton-lang/triton.
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
Automate your mobile devices with natural language commands - an LLM agnostic mobile Agent 🤖
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
🛠A lite C++ AI toolkit: 100+ models with MNN, ORT and TRT, including Det, Seg, Stable-Diffusion, Face-Fusion, etc.🎉
🐉 Making Rust a first-class language and ecosystem for GPU shaders 🚧
RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing …
Python tool for converting files and office documents to Markdown.
A GPU cluster manager that configures and orchestrates inference engines like vLLM and SGLang for high-performance AI model deployment.
Open-source Pricing and Billing Infrastructure 🚀 Subscription management, Invoicing, Pricing, Usage-based billing, Cost limiting, Grandfathering, Experiments, Revenue analytics & Actionable insights
Financial data platform for analysts, quants and AI agents.
Official inference framework for 1-bit LLMs
depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.
Waydroid uses a container-based approach to boot a full Android system on a regular GNU/Linux system like Ubuntu.
Efficent platform for inference and serving local LLMs including an OpenAI compatible API server.
Rust keyboard firmware library with layers, macros, real-time keymap editing, wireless(BLE) and split support
Rust implementation of behavior trees for deterministic AI
Letta is the platform for building stateful agents: AI with advanced memory that can learn and self-improve over time.