Skip to content
View yiyexy's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report yiyexy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

MMaDA - Open-Sourced Multimodal Large Diffusion Language Models

Python 1,542 78 Updated Nov 16, 2025

Fully Open Framework for Democratized Multimodal Reinforcement Learning.

Python 28 2 Updated Dec 19, 2025

A minimal, educational HEVC (H.265) encoder written in Python.

Python 27 Updated Dec 10, 2025

Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.

Python 3,280 258 Updated Dec 25, 2025

Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Python 129 5 Updated Dec 17, 2025

Official PyTorch implementation of ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder

Python 17 2 Updated Dec 4, 2025

Official PyTorch implementation for "Large Language Diffusion Models"

Python 3,428 231 Updated Nov 12, 2025

[AAAI 2026 Oral] The official code of "UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning"

Python 51 Updated Dec 8, 2025
Python 3 Updated Nov 20, 2025

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,164 193 Updated Oct 9, 2025

AIMedia 是一款自动抓取热点,AI创作文章,自动发布的集成软件。支持头条,小红书,公众号等

Python 768 151 Updated Dec 25, 2025

State-of-the-art 2D and 3D Face Analysis Project

Python 27,411 5,872 Updated Nov 25, 2025

Fully Open Framework for Democratized Multimodal Training

Python 663 53 Updated Dec 15, 2025

Multilingual Document Layout Parsing in a Single Vision-Language Model

Python 5,933 579 Updated Oct 31, 2025

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …

Python 11,851 1,087 Updated Dec 25, 2025

Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support

Python 215 37 Updated Dec 25, 2025

A simple, unified multimodal models training engine. Lean, flexible, and built for hacking at scale.

Python 681 27 Updated Dec 23, 2025

My personal page

HTML 701 113 Updated Jun 10, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 17,797 2,901 Updated Dec 26, 2025

Margin-based Vision Transformer

60 2 Updated Nov 28, 2025

Official repository for HOComp: Interaction-Aware Human-Object Composition

Python 27 Updated Dec 3, 2025

An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.

Python 1,506 190 Updated Dec 19, 2025

Ongoing research training transformer models at scale

Python 14,710 3,414 Updated Dec 25, 2025

A curated list of balanced multimodal learning methods.

147 5 Updated Dec 22, 2025

Official code repo for our work "Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models"

Python 53 3 Updated Jun 17, 2025

[NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

Python 192 6 Updated Sep 18, 2025

The simplest, fastest repository for training/finetuning small-sized VLMs.

Python 4,445 434 Updated Oct 27, 2025
11 Updated Sep 4, 2025

A paper list of some recent works about Token Compress for Vit and VLM

791 37 Updated Dec 24, 2025
Next