Skip to content
View Carkham's full-sized avatar

Block or report Carkham

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Jupyter Notebook 4 Updated Dec 16, 2025

[ICCV25 Highlight] The official implementation of the paper "LEGION: Learning to Ground and Explain for Synthetic Image Detection"

Python 72 6 Updated Oct 22, 2025

(NeurIPS 2025 🔥) Official implementation for "Efficient Multi-modal Large Language Models via Progressive Consistency Distillation"

Python 38 3 Updated Nov 23, 2025

TRivia: Self-supervised Fine-tuning of Vision-Language Models for Table Recognition

Python 15 Updated Dec 3, 2025

Contexts Optical Compression

Python 21,561 1,928 Updated Oct 25, 2025
Jupyter Notebook 472 84 Updated Jul 8, 2025

An official implementation of "CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning"

Python 157 7 Updated Dec 24, 2025

Code for "The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs"

Python 73 3 Updated Sep 30, 2025

[ICCV 2025] SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs

Python 79 2 Updated Oct 19, 2025

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 154,201 31,530 Updated Dec 24, 2025

[NeurIPS 2025] IEAP: Image Editing As Programs with Diffusion Models

Python 109 5 Updated Sep 27, 2025
Python 11 1 Updated May 28, 2025
Python 109 8 Updated Nov 19, 2025

📚 Collection of token-level model compression resources.

187 7 Updated Sep 3, 2025

AnywhereDoor is a multi-target backdoor attack tailored for object detection. Once implanted, it enables adversaries to specify different attack types (object vanishing, fabrication, or misclassifi…

Jupyter Notebook 6 4 Updated Jul 8, 2025
Python 32 3 Updated Oct 1, 2025

A large scale camera-taken table detection and recognition dataset.

Python 143 9 Updated Jul 21, 2025

[EMNLP 2025 main 🔥] Code for "Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More"

Python 97 2 Updated Oct 12, 2025

Official implementation for the paper"Towards Understanding How Knowledge Evolves in Large Vision-Language Models"

Python 24 Updated Apr 10, 2025

AL-Bench: A benchmark for automatic logging

Python 8 Updated Aug 19, 2025

MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding

Python 267 29 Updated Aug 8, 2025

Solve Visual Understanding with Reinforced VLMs

Python 5,773 376 Updated Oct 21, 2025

Parsing-free RAG supported by VLMs

Python 888 74 Updated Dec 7, 2025

A Survey on Multimodal Retrieval-Augmented Generation

447 20 Updated Nov 8, 2025

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …

Python 11,826 1,086 Updated Dec 24, 2025

MM-Eureka V0 also called R1-Multimodal-Journey, Latest version is in MM-Eureka

Python 322 10 Updated Jun 21, 2025

Minimal reproduction of DeepSeek R1-Zero

Python 12,515 1,536 Updated Apr 24, 2025

This project aims to collect the latest "call for reviewers" links from various top CS/ML/AI conferences/journals

1,050 43 Updated Dec 11, 2025

Witness the aha moment of VLM with less than $3.

Python 4,012 289 Updated May 19, 2025

Fully open reproduction of DeepSeek-R1

Python 25,751 2,407 Updated Nov 24, 2025
Next