Starred repositories
The most powerful AI agent and AI chat software on Android/Operit是一款Android上目前能力最为强大的AI Agent
An Open Phone Agent Model & Framework. Unlocking the AI Phone for Everyone
GELab: GUI Exploration Lab. One of the best GUI agent solutions in the galaxy, built by the StepFun-GELab team and powered by Step’s research capabilities.
[TCSVT] DAOcc: 3D Object Detection Assisted Multi-Sensor Fusion for 3D Occupancy Prediction
MM-ACT: Learn from Multimodal Parallel Generation to Act
Benchmarking Knowledge Transfer in Lifelong Robot Learning
LLaVA_OpenVLA part 2, Generate MLLM general training data
YOLO multi-threaded and hardware-accelerated inference framework based on RKNN
[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding
An open source implementation of CLIP.
OCR model that handles complex tables, forms, handwriting with full layout.
使用OpenCV+onnxruntime部署中文clip做以文搜图,给出一句话来描述想要的图片,就能从图库中搜出来符合要求的图片。包含C++和Python两个版本的程序
Real-time Vision Language Model interaction via webcam - WebRTC-based web interface
Semantic segmentation models with 500+ pretrained convolutional and transformer-based backbones.
C++ implementations of PP-OCRv3 and PP-OCRv5 using ncnn for inference.
Official implementation of "I2VWM: Robust Watermarking for Image to Video Generation"
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
[⭐️ WACV 2025 Oral ⭐️] PETALface: Parameter Efficient Transfer Learning for Low-resolution Face Recognition
🔥🔥High-Performance Face Recognition Library on PaddlePaddle & PyTorch🔥🔥
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …
A tool to modify ONNX models in a visualization fashion, based on Netron and Flask.
Wan 2.5 AI Video Generator - Transform text & images into HD videos with synchronized audio
[ICCV 2023] TransFace: Calibrating Transformer Training for Face Recognition from a Data-Centric Perspective