Skip to content
View Arvintian's full-sized avatar

Block or report Arvintian

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Turn any document or a whole zip into an interactive knowledge graph, using a self-hosted Qwen3.6-35B-A3B-MTP on a single NVIDIA L4

Python 126 15 Updated Jun 12, 2026

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 28,961 6,512 Updated Jun 13, 2026

LLM驱动的 A/H/美股智能分析:多数据源行情 + 实时新闻 + LLM决策仪表盘 + 多渠道推送,零成本定时运行,纯白嫖. LLM-powered stock analysis system for A/H/US markets.

Python 42,393 40,186 Updated Jun 13, 2026

One HTML file. Chat with OpenAI, Claude, Gemini, DeepSeek, Ollama and any OpenAI-compatible endpoint — streaming, reasoning, vision, fully client-side.

HTML 1 Updated Jun 10, 2026

NFS-Ganesha is an NFSv3,v4,v4.1 fileserver that runs in user mode on most UNIX/Linux systems

C 1,764 574 Updated Jun 12, 2026

美股指南

4,281 661 Updated Jun 11, 2026

Fastest enterprise AI gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.

Go 5,728 750 Updated Jun 13, 2026

An ai hardware using qwen3.5 omni as its model.

Python 226 55 Updated May 1, 2026

A Cloudflare-based email service | 基于 Cloudflare 的邮箱服务 | Cloudflare Email 邮箱 Mail

JavaScript 11,127 15,266 Updated Jun 9, 2026

A unified library of SOTA model optimization techniques like quantization, distillation, pruning, neural architecture search, speculative decoding, etc. It compresses deep learning models for downs…

Python 2,920 436 Updated Jun 13, 2026

[ICLR 2026] ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference

Python 306 29 Updated Jun 8, 2026

An easy-to-use SDK for Feishu and Lark Open Platform (Instant Messaging API only)

Go 244 37 Updated May 14, 2026

Stealth Chromium that passes every bot detection test. Drop-in Playwright replacement with source-level fingerprint patches. 30/30 tests passed.

Python 25,876 2,051 Updated Jun 9, 2026

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 3,392 545 Updated Jun 12, 2026

Optimized primitives for collective multi-GPU communication

C++ 4,808 1,295 Updated Jun 13, 2026

Tensor library for machine learning

C++ 14,805 1,673 Updated Jun 12, 2026

llama-benchy - llama-bench style benchmarking tool for all backends

Python 464 42 Updated Jun 10, 2026

TurboQuant KV Cache Compression for llama.cpp — 5.2x memory reduction with near-lossless quality | Implementation of Google DeepMind's TurboQuant (ICLR 2026)

C++ 82 13 Updated Jun 13, 2026

Adaptive Precision for EXpert Models: MoE-aware mixed-precision quantization

Shell 348 26 Updated May 29, 2026

LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.

Go 46,826 4,134 Updated Jun 13, 2026

🥤 NRUP - A reliable encrypted UDP transport protocol built on DTLS

Go 143 13 Updated Apr 21, 2026

Xray panel supporting multi-protocol multi-user expire day & traffic & IP limit (Vmess, Vless, Trojan, ShadowSocks, Wireguard, Hysteria, Tunnel, Mixed, HTTP, Tun)

TypeScript 40,538 7,601 Updated Jun 13, 2026

🚀 The fast, Pythonic way to build MCP servers and clients.

Python 25,616 2,070 Updated Jun 6, 2026

Claude Code 泄露源码 - 本地可运行版本,新增跨平台桌面端软件补齐Computer Use(附带核心模块解析)

TypeScript 12,575 8,235 Updated Jun 13, 2026

Accelerating Long Context LLM Inference with Accuracy-Preserving Context Optimization in SGLang, vLLM, llama.cpp, OpenClaw, RAG, and Agentic AI.

Python 115 5 Updated Jun 13, 2026

"OpenHarness: Open Agent Harness with a Built-in Personal Agent--Ohmo!"

Python 13,810 2,258 Updated Jun 4, 2026

The agent that grows with you

Python 192,540 33,575 Updated Jun 13, 2026

LiteRT-LM is Google's production-ready, high-performance, open-source inference framework for deploying Large Language Models on edge devices.

C++ 5,574 575 Updated Jun 13, 2026

Your Personal AI Assistant; easy to install, deploy on your own machine or on the cloud; supports multiple chat apps with easily extensible capabilities.

Python 17,519 2,601 Updated Jun 12, 2026

Make use of Intel Arc Series GPU to Run Ollama, StableDiffusion, Whisper and Open WebUI, for image generation, speech recognition and interaction with Large Language Models (LLM).

Dockerfile 357 45 Updated Jun 13, 2026
Next