Stars
Free ChatGPT&DeepSeek API Key,免费ChatGPT&DeepSeek API。免费接入DeepSeek API和GPT4 API,支持 gpt | deepseek | claude | gemini | grok 等排名靠前的常用大模型。
A next.js web application that integrates AI capabilities with draw.io diagrams. This app allows you to create, modify, and enhance diagrams through natural language commands and AI-assisted visual…
LangGraph 1.0 Tutorial
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Ola: Pushing the Frontiers of Omni-Modal Language Model
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
Fully open reproduction of DeepSeek-R1
Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with good capability of general video understanding.
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
PyTorch code and models for the DINOv2 self-supervised learning method.
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.
TransNet V2: Shot Boundary Detection Neural Network
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Official inference repo for FLUX.1 models
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …
[ICCV 2025] LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning
Inpaint anything using Segment Anything and inpainting models.
🔥🔥🔥 [IEEE TCSVT] Latest Papers, Codes and Datasets on Vid-LLMs.
Collection of AWESOME vision-language models for vision tasks
a state-of-the-art-level open visual language model | 多模态预训练模型
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
We use MixedWM38, the mixed-type wafer defect pattern dataset for wafer defect pattern regcognition with visual transformers.
willard-yuan / SoTu
Forked from yzhangcs/SoTuBag of Visual Feature with Hamming Enbedding, Reranking