-
-
unsloth Public
Forked from unslothai/unslothFine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.
Python Apache License 2.0 UpdatedDec 23, 2025 -
SAM3_LoRA Public
Forked from Sompote/SAM3_LoRAFinetune SAM3 with LoRA — optimized for images. A simple setup for training SAM3 on image datasets. Video finetuning is not yet supported but planned for future releases.
Python UpdatedDec 18, 2025 -
DINOV3-YOLOV12 Public
Forked from Sompote/DINOV3-YOLOV12Use DINOv3’s powerful, self-supervised visual features + YOLOv12’s blazing-fast detection, all in one repo. Whether you have only a few hundred labeled images or a medium-sized dataset, DINOV3-YOLO…
Python GNU Affero General Public License v3.0 UpdatedNov 27, 2025 -
textvqa_grounding_task_qwen2.5-vl-ft Public
Forked from 828Tina/textvqa_grounding_task_qwen2.5-vl-ftJupyter Notebook UpdatedMay 20, 2025 -
VL-Rethinker Public
Forked from TIGER-AI-Lab/VL-RethinkerThe official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"
Python Apache License 2.0 UpdatedApr 29, 2025 -
deepseek-r1-vision Public
Forked from sungatetop/deepseek-r1-visionan method to make vlm think like r1
Python UpdatedFeb 20, 2025 -
Video-RAG-master Public
Forked from Leon1207/Video-RAG-masterThis is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension"
Python UpdatedJan 15, 2025 -
qwen2vl_data_processingv3 Public
Forked from zew013/qwen2vl_data_processingv3Jupyter Notebook UpdatedDec 15, 2024 -
-
-
VisRAG Public
Forked from OpenBMB/VisRAGParsing-free RAG supported by VLMs
Python Apache License 2.0 UpdatedNov 4, 2024 -
Qwen2-vl-sft Public
Forked from digbangbang/Qwen2-vl-sftThis repository contains a project I completed during my internship at meituan. Specifically, it performs SFT on Qwen2-vl, uses internal company data, and fine-tunes Qwen2-vl for downstream tasks (…
Python Apache License 2.0 UpdatedSep 20, 2024 -
PrimeVul Public
Forked from DLVulDet/PrimeVulRepository for PrimeVul Vulnerability Detection Dataset
Python MIT License UpdatedSep 7, 2024 -
Vista Public
Forked from OpenDriveLab/VistaA Generalizable World Model for Autonomous Driving
Python Apache License 2.0 UpdatedSep 4, 2024 -
AL-Ref-SAM2 Public
Forked from appletea233/AL-Ref-SAM2AL-Ref-SAM 2: Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation
Python MIT License UpdatedSep 4, 2024 -
dify-with-qwen-vl Public
Forked from soulteary/dify-with-qwen-vl视频理解:千问视频多模态模型 & Dify
Python Apache License 2.0 UpdatedSep 2, 2024 -
llm2vec Public
Forked from McGill-NLP/llm2vecCode for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'
Python MIT License UpdatedAug 30, 2024 -
-
-
snag_release Public
Forked from fmu2/snag_releaseOfficial Implementation of SnAG (CVPR 2024)
Python UpdatedApr 22, 2024 -
keras-llm-robot Public
Forked from smalltong02/keras-llm-robotA web UI Project In order to learn the large language model. This project includes features such as chat, quantization, fine-tuning, prompt engineering templates, and multimodality.
Python Apache License 2.0 UpdatedJan 23, 2024 -
llama Public
Forked from meta-llama/llamaInference code for LLaMA models
-
ego4d_asl Public
Forked from JonnyS1226/ego4d_aslcode for Ego4D Workshop@CVPR 2023 - 1st in MQ & 2nd in NLQ challenge
Python UpdatedDec 19, 2023 -
NExT-Chat Public
Forked from NExT-ChatV/NExT-ChatThe code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".
Python Apache License 2.0 UpdatedDec 19, 2023 -
MIC Public
Forked from HaozheZhao/MICMMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU
Python UpdatedDec 18, 2023 -
AdaTAD Public
Forked from sming256/AdaTADThe official implementation of AdaTAD: End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames
1 UpdatedDec 9, 2023 -
-
ONE-PEACE Public
Forked from OFA-Sys/ONE-PEACEA general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Python Apache License 2.0 UpdatedDec 5, 2023 -
Video-LLaVA Public
Forked from PKU-YuanGroup/Video-LLaVAVideo-LLaVA: Learning United Visual Representation by Alignment Before Projection
Python Apache License 2.0 UpdatedNov 26, 2023