Skip to content
View frotms's full-sized avatar

Block or report frotms

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A Unified Cache Acceleration Framework for 🤗 Diffusers: Qwen-Image-Lightning, Qwen-Image, HunyuanImage, FLUX, Wan, etc.

Python 384 12 Updated Oct 10, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 18,714 3,100 Updated Oct 10, 2025

Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.

Python 45,526 3,779 Updated Oct 9, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 3,377 243 Updated Oct 10, 2025

A YAML parser and emitter in C++

C++ 5,677 2,028 Updated Oct 1, 2025

Implementation of "YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception".

Python 829 89 Updated Aug 1, 2025

🚀 Easier & Faster YOLO Deployment Toolkit for NVIDIA 🛠️

C++ 1,460 159 Updated Sep 27, 2025

Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.

Python 5,596 301 Updated Sep 30, 2025

Hands-On Practical MLIR Tutorial

C++ 616 90 Updated Oct 20, 2023

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ 11,800 1,788 Updated Oct 10, 2025

诺亚盘古大模型研发背后的真正的心酸与黑暗的故事。

11,392 1,372 Updated Jul 9, 2025

Trae Agent is an LLM-based agent for general purpose software engineering tasks.

Python 9,627 995 Updated Sep 24, 2025

open-source coding LLM for software engineering tasks

Python 968 118 Updated Sep 30, 2025

The power of Claude Code / GeminiCLI / CodexCLI + [Gemini / OpenAI / OpenRouter / Azure / Grok / Ollama / Custom Model / All Of The Above] working as one.

Python 8,547 718 Updated Oct 8, 2025

Simple Python version management

Roff 43,331 3,203 Updated Oct 10, 2025

Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.

Go 153,806 13,356 Updated Oct 10, 2025

🤖 The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transf…

Go 35,736 2,823 Updated Oct 9, 2025

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

Python 11,831 779 Updated Oct 6, 2025

Production-ready platform for agentic workflow development.

TypeScript 116,066 17,902 Updated Oct 10, 2025

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Python 3,304 218 Updated Sep 26, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 7,919 786 Updated Sep 19, 2025

Langflow is a powerful tool for building and deploying AI-powered agents and workflows.

Python 130,466 7,768 Updated Oct 10, 2025

Eliminate all the tedious hassle when making state-of-the-art C++ 14 - 23 libraries!

C 182 24 Updated Oct 9, 2025

Fully Local Manus AI. No APIs, No $200 monthly bills. Enjoy an autonomous agent that thinks, browses the web, and code for the sole cost of electricity. 🔔 Official updates only via twitter @Martin9…

Python 22,097 2,360 Updated Sep 14, 2025

Open Source Deep Research Alternative to Reason and Search on Private Data. Written in Python.

Python 7,015 678 Updated Jul 10, 2025

A tool for bandwidth measurements on NVIDIA GPUs.

C++ 542 59 Updated Apr 15, 2025

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 4,592 312 Updated Aug 19, 2025

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,696 290 Updated Jun 12, 2025

PaddleOCR inference in PyTorch. Converted from [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)

Python 1,070 202 Updated Sep 11, 2025

T5 onnxruntime cpp

C++ 8 Updated Jul 28, 2024
Next