Skip to content
View wjf5203's full-sized avatar

Block or report wjf5203

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 140 9 Updated Jun 28, 2024

MiroThinker is a series of open-source agentic models trained for deep research and complex tool use scenarios.

Python 1,345 92 Updated Dec 20, 2025
Jupyter Notebook 97 1 Updated Nov 8, 2025

[ICCV 2025] OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning

Python 409 20 Updated Dec 1, 2025

Fully Open Framework for Democratized Multimodal Training

Python 660 50 Updated Dec 15, 2025

WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction

Python 57 1 Updated Sep 3, 2025

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

Python 3,401 461 Updated Dec 18, 2025

Official PyTorch implementation of FlowMo.

Jupyter Notebook 105 6 Updated Apr 7, 2025
Python 4,461 435 Updated Sep 14, 2025

Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.

Jupyter Notebook 3,285 203 Updated May 19, 2025

Summarize existing representative LLMs text datasets.

1,404 139 Updated Oct 11, 2025

Curated list of datasets and tools for post-training.

4,100 334 Updated Nov 10, 2025

A quick guide (especially) for trending instruction finetuning datasets

3,328 227 Updated Nov 28, 2023

Awesome LLM pre-training resources, including data, frameworks, and methods.

298 20 Updated Apr 29, 2025

AliTok: Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model

Python 50 2 Updated Oct 12, 2025

FlexTok: Resampling Images into 1D Token Sequences of Flexible Length

Jupyter Notebook 279 14 Updated Jun 2, 2025

GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset

Python 238 5 Updated Aug 15, 2025

Official inference code and LongText-Bench benchmark for our paper X-Omni (https://arxiv.org/pdf/2507.22058).

Python 400 10 Updated Aug 26, 2025

A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.

Python 3,879 280 Updated Sep 25, 2025

[NeurIPS 2025] Efficient Reasoning Vision Language Models

Python 439 29 Updated Sep 18, 2025

华中科技大学博士毕业论文Latex模板

TeX 229 48 Updated Jul 24, 2025

MMaDA - Open-Sourced Multimodal Large Diffusion Language Models

Python 1,533 78 Updated Nov 16, 2025

(Accepted by IJCV) Liquid: Language Models are Scalable and Unified Multi-modal Generators

Python 635 34 Updated Nov 10, 2025

This is a repo to track the latest autoregressive visual generation papers.

420 5 Updated Jun 25, 2025

[ICCV 2025] SimVQ: Addressing Representation Collapse in Vector Quantized Models with One Linear Layer

Python 311 8 Updated Dec 29, 2024

Image and video Tokenizer/VAE selection guide, text and face reconstruction evaluation.

Python 133 Updated Nov 24, 2025

[ICLR 2025][arXiv:2406.07548] Image and Video Tokenization with Binary Spherical Quantization

Python 188 6 Updated Dec 18, 2025

Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"

Python 1,052 64 Updated Nov 4, 2025
Next