Stars
Official implementation of paper "VLM³: Vision Language Models Are Native 3D Learners".
A Rust library integrated with ONNXRuntime, providing a collection of Computer Vison and Vision-Language models such as YOLO, FastVLM, and more.
Finetune SAM3 with LoRA — optimized for images. A simple setup for training SAM3 on image datasets. Video finetuning is not yet supported but planned for future releases.
[ICCV 2025] SAM4D: Segment Anything in Camera and LiDAR Streams
RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments
A curated list of reinforcement learning with verifiable rewards (continually updated)
🎬 卡卡字幕助手 | VideoCaptioner - 基于 LLM 的智能字幕助手 - 视频字幕生成、断句、校正、字幕翻译全流程处理!- A powered tool for easy and efficient video subtitling.
Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)