Stars
The agent that grows with you
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)
🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
Interact with your documents using the power of GPT, 100% privately, no data leaks
Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.
DeepSeek Coder: Let the Code Write Itself
⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。
🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation
Wan: Open and Advanced Large-Scale Video Generative Models
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-V4, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Gemma4, Llava, …
Low-code framework for building custom LLMs, neural networks, and other AI models
Effortless data labeling with AI support from Segment Anything and other awesome models.
📡 Your own AI-powered news radar. Generates daily briefings in English & Chinese. | 用 AI 构建你专属的新闻雷达
Transcribe and summarize videos and podcasts using AI. Open-source, multi-platform, and supports multiple languages.
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
Implementation of "YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception".
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted fo…
Sonic is a method about ' Shifting Focus to Global Audio Perception in Portrait Animation',you can use it in comfyUI
This node provides lip-sync capabilities in ComfyUI using ByteDance's LatentSync model. It allows you to synchronize video lips with audio input.
Flame is an open-source multimodal AI system designed to translate UI design mockups into high-quality React code. It leverages vision-language modeling, automated data synthesis, and structured tr…
YOLOv8检测模块组合优化改进(成功涨点):添加GAM注意力机制;添加小目标检测头;替换为Wise_IoU损失函数+完整web端展示(实现简单目标跟踪功能)
AI Coding 落地架构的参考项目,支持所有主流支持 Skills 的 Agent/AI Coding IDE。
微信公众号聚合平台,获取多个公众号的博文进行筛选、过滤,使用户更方便的读取公众号上的所有文章
Tracking and counting persons
中文版面检测(Chinese layout detection),yolov8 is used to detect the layout of Chinese document images。