Skip to content
View wjf5203's full-sized avatar

Block or report wjf5203

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
221 results for source starred repositories
Clear filter
Python 141 9 Updated Jun 28, 2024

MiroThinker is an open source deep research agent optimized for research and prediction. It achieves a 80.8% Avg@8 score on the challenging GAIA benchmark.

Python 6,096 450 Updated Feb 4, 2026
Jupyter Notebook 115 3 Updated Nov 8, 2025

[ICCV 2025] OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning

Python 447 24 Updated Jan 29, 2026

Fully Open Framework for Democratized Multimodal Training

Python 716 57 Updated Dec 27, 2025

WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction

Python 57 2 Updated Sep 3, 2025

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

Python 3,623 512 Updated Feb 4, 2026

Official PyTorch implementation of FlowMo.

Jupyter Notebook 110 7 Updated Apr 7, 2025
Python 4,549 441 Updated Sep 14, 2025

Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.

Jupyter Notebook 3,344 207 Updated May 19, 2025

Summarize existing representative LLMs text datasets.

1,429 140 Updated Oct 11, 2025

Curated list of datasets and tools for post-training.

4,223 351 Updated Nov 10, 2025

A quick guide (especially) for trending instruction finetuning datasets

3,356 230 Updated Nov 28, 2023

Awesome LLM pre-training resources, including data, frameworks, and methods.

323 23 Updated Apr 29, 2025

[ICLR2026] AliTok: Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model

Python 53 2 Updated Oct 12, 2025

FlexTok: Resampling Images into 1D Token Sequences of Flexible Length

Jupyter Notebook 290 14 Updated Jun 2, 2025

GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset

Python 244 5 Updated Aug 15, 2025

Official inference code and LongText-Bench benchmark for our paper X-Omni (https://arxiv.org/pdf/2507.22058).

Python 420 11 Updated Aug 26, 2025

A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.

Python 4,092 300 Updated Jan 5, 2026

[NeurIPS 2025] Efficient Reasoning Vision Language Models

Python 448 29 Updated Sep 18, 2025

华中科技大学博士毕业论文Latex模板

TeX 245 50 Updated Jul 24, 2025

MMaDA - Open-Sourced Multimodal Large Diffusion Language Models

Python 1,570 83 Updated Nov 16, 2025

(Accepted by IJCV) Liquid: Language Models are Scalable and Unified Multi-modal Generators

Python 637 34 Updated Nov 10, 2025

This is a repo to track the latest autoregressive visual generation papers.

430 5 Updated Jun 25, 2025

[ICCV 2025] SimVQ: Addressing Representation Collapse in Vector Quantized Models with One Linear Layer

Python 316 9 Updated Dec 29, 2024

Image and video Tokenizer/VAE selection guide, text and face reconstruction evaluation.

Python 135 Updated Nov 24, 2025

[ICLR 2025][arXiv:2406.07548] Image and Video Tokenization with Binary Spherical Quantization

Python 197 6 Updated Dec 18, 2025

Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"

Python 1,092 68 Updated Dec 22, 2025
Next