- Beijing,China
- www.arvintian.cn
Lists (3)
Sort Name ascending (A-Z)
Stars
Turn any document or a whole zip into an interactive knowledge graph, using a self-hosted Qwen3.6-35B-A3B-MTP on a single NVIDIA L4
SGLang is a high-performance serving framework for large language models and multimodal models.
LLM驱动的 A/H/美股智能分析:多数据源行情 + 实时新闻 + LLM决策仪表盘 + 多渠道推送,零成本定时运行,纯白嫖. LLM-powered stock analysis system for A/H/US markets.
One HTML file. Chat with OpenAI, Claude, Gemini, DeepSeek, Ollama and any OpenAI-compatible endpoint — streaming, reasoning, vision, fully client-side.
NFS-Ganesha is an NFSv3,v4,v4.1 fileserver that runs in user mode on most UNIX/Linux systems
Fastest enterprise AI gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.
An ai hardware using qwen3.5 omni as its model.
A Cloudflare-based email service | 基于 Cloudflare 的邮箱服务 | Cloudflare Email 邮箱 Mail
A unified library of SOTA model optimization techniques like quantization, distillation, pruning, neural architecture search, speculative decoding, etc. It compresses deep learning models for downs…
[ICLR 2026] ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
An easy-to-use SDK for Feishu and Lark Open Platform (Instant Messaging API only)
Stealth Chromium that passes every bot detection test. Drop-in Playwright replacement with source-level fingerprint patches. 30/30 tests passed.
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Optimized primitives for collective multi-GPU communication
llama-benchy - llama-bench style benchmarking tool for all backends
TurboQuant KV Cache Compression for llama.cpp — 5.2x memory reduction with near-lossless quality | Implementation of Google DeepMind's TurboQuant (ICLR 2026)
Adaptive Precision for EXpert Models: MoE-aware mixed-precision quantization
LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.
🥤 NRUP - A reliable encrypted UDP transport protocol built on DTLS
Xray panel supporting multi-protocol multi-user expire day & traffic & IP limit (Vmess, Vless, Trojan, ShadowSocks, Wireguard, Hysteria, Tunnel, Mixed, HTTP, Tun)
🚀 The fast, Pythonic way to build MCP servers and clients.
Claude Code 泄露源码 - 本地可运行版本,新增跨平台桌面端软件补齐Computer Use(附带核心模块解析)
Accelerating Long Context LLM Inference with Accuracy-Preserving Context Optimization in SGLang, vLLM, llama.cpp, OpenClaw, RAG, and Agentic AI.
"OpenHarness: Open Agent Harness with a Built-in Personal Agent--Ohmo!"
The agent that grows with you
LiteRT-LM is Google's production-ready, high-performance, open-source inference framework for deploying Large Language Models on edge devices.
Your Personal AI Assistant; easy to install, deploy on your own machine or on the cloud; supports multiple chat apps with easily extensible capabilities.
Make use of Intel Arc Series GPU to Run Ollama, StableDiffusion, Whisper and Open WebUI, for image generation, speech recognition and interaction with Large Language Models (LLM).