LongCat-Image

Model Introduction

We introduce LongCat-Image, a pioneering open-source and bilingual (Chinese-English) foundation model for image generation, designed to address core challenges in multilingual text rendering, photorealism, deployment efficiency, and developer accessibility prevalent in current leading models.

Key Features

🌟 Exceptional Efficiency and Performance: With only 6B parameters, LongCat-Image surpasses numerous open-source models that are several times larger across multiple benchmarks, demonstrating the immense potential of efficient model design.
🌟 Superior Editing Performance: LongCat-Image-Edit model achieves state-of-the-art performance among open-source models, delivering leading instruction-following and image quality with superior visual consistency.
🌟 Powerful Chinese Text Rendering: LongCat-Image demonstrates superior accuracy and stability in rendering common Chinese characters compared to existing SOTA open-source models and achieves industry-leading coverage of the Chinese dictionary.
🌟 Remarkable Photorealism: Through an innovative data strategy and training framework, LongCat-Image achieves remarkable photorealism in generated images.
🌟 Comprehensive Open-Source Ecosystem: We provide a complete toolchain, from intermediate checkpoints to full training code, significantly lowering the barrier for further research and development.

News

🔥 [2025-12-16] LongCat-Image is now fully supported in Diffusers!
🔥 [2025-12-09] T2I-CoreBench results are out! LongCat-Image ranks 2nd among all open-source models in comprehensive performance, surpassed only by the 32B-parameter Flux2.dev.
🔥 [2025-12-08] We released our Technical Report on arXiv!
🔥 [2025-12-05] We released the weights for LongCat-Image, LongCat-Image-Dev, and LongCat-Image-Image on Hugging Face and ModelScope.

Showcase

Text-to-Image

Image Editing

Quick Start

Installation

# create conda environment
conda create -n longcat-image python=3.10
conda activate longcat-image

# install requirements for model inference
pip install -r infer_requirements.txt
pip install git+https://github.com/huggingface/diffusers

Model Download

Models	Type	Description	Download Link
LongCat‑Image	Text‑to‑Image	Final Release. The standard model for out‑of‑the‑box inference.	🤗 Huggingface
LongCat‑Image‑Dev	Text‑to‑Image	Development. Mid-training checkpoint, suitable for fine-tuning.	🤗 Huggingface
LongCat‑Image‑Edit	Image Editing	Specialized model for image editing.	🤗 Huggingface

Run Text-to-Image Generation

Tip

Leveraging a stronger LLM for prompt refinement can further enhance image generation quality. Please refer to inference_t2i.py for detailed usage instructions.

Caution

📝 Special Handling for Text Rendering

For both Text-to-Image and Image Editing tasks involving text generation, you must enclose the target text within single or double quotation marks (both English '...' / "..." and Chinese ‘...’ / “...” styles are supported).

Reasoning: The model utilizes a specialized character-level encoding strategy specifically for quoted content. Failure to use explicit quotation marks prevents this mechanism from triggering, which will severely compromise the text rendering capability.

import torch
from diffusers import LongCatImagePipeline

if __name__ == '__main__':
    device = torch.device('cuda')

    pipe = LongCatImagePipeline.from_pretrained("meituan-longcat/LongCat-Image", torch_dtype= torch.bfloat16 )
    # pipe.to(device, torch.bfloat16)  # Uncomment for high VRAM devices (Faster inference)
    pipe.enable_model_cpu_offload()  # Offload to CPU to save VRAM (Required ~17 GB); slower but prevents OOM

    prompt = '一个年轻的亚裔女性，身穿黄色针织衫，搭配白色项链。她的双手放在膝盖上，表情恬静。背景是一堵粗糙的砖墙，午后的阳光温暖地洒在她身上，营造出一种宁静而温馨的氛围。镜头采用中距离视角，突出她的神态和服饰的细节。光线柔和地打在她的脸上，强调她的五官和饰品的质感，增加画面的层次感与亲和力。整个画面构图简洁，砖墙的纹理与阳光的光影效果相得益彰，突显出人物的优雅与从容。'
    
    image = pipe(
        prompt,
        height=768,
        width=1344,
        guidance_scale=4.0,
        num_inference_steps=50,
        num_images_per_prompt=1,
        generator=torch.Generator("cpu").manual_seed(43),
        enable_cfg_renorm=True,
        enable_prompt_rewrite=True
    ).images[0]
    image.save('./t2i_example.png')

Run Image Editing

import torch
from PIL import Image
from diffusers import LongCatImageEditPipeline

if __name__ == '__main__':
    device = torch.device('cuda')

    pipe = LongCatImageEditPipeline.from_pretrained("meituan-longcat/LongCat-Image-Edit", torch_dtype= torch.bfloat16 )
    # pipe.to(device, torch.bfloat16)  # Uncomment for high VRAM devices (Faster inference)
    pipe.enable_model_cpu_offload()  # Offload to CPU to save VRAM (Required ~18 GB); slower but prevents OOM

    img = Image.open('assets/test.png').convert('RGB')
    prompt = '将猫变成狗'
    image = pipe(
        img,
        prompt,
        negative_prompt='',
        guidance_scale=4.5,
        num_inference_steps=50,
        num_images_per_prompt=1,
        generator=torch.Generator("cpu").manual_seed(43)
    ).images[0]

    image.save('./edit_example.png')

Evaluation Results

Text-to-Image Generation

The quantitative evaluation results on public benchmarks demonstrate LongCat-Image's competitive performance:

Model	Accessibility	Parameters	GenEval↑	DPG↑	WISE↑
FLUX.1‑dev	Open Source	12B	0.66	83.84	0.50
GPT Image 1 [High]	Proprietary	-	0.84	85.15	-
HunyuanImage‑3.0	Open Source	80B	0.72	86.10	0.57
Qwen‑Image	Open Source	20B	0.87	88.32	0.62
Seedream 4.0	Proprietary	-	0.84	88.25	0.78
LongCat‑Image	Open Source	6B	0.87	86.80	0.65

Text Rendering

Model	GlyphDraw2↑	CVTG‑2K↑			ChineseWord↑
Model	GlyphDraw2↑	Acc	NED	CLIPScore	ChineseWord↑
HunyuanImage‑3.0	0.78	0.7650	0.8765	0.8121	58.5
Qwen‑Image	0.93	0.8288	0.9297	0.8059	56.6
Seedream 4.0	0.97	0.8917	0.9507	0.7853	49.3
LongCat‑Image	0.95	0.8658	0.9361	0.7859	90.7

Human Evaluation (MOS)

Model	Alignment↑	Plausibility↑	Realism↑	Aesthetics↑
HunyuanImage‑3.0	3.40	3.33	3.50	3.04
Qwen‑Image	3.95	3.48	3.45	3.09
Seedream 4.0	4.25	3.76	3.54	3.10
LongCat‑Image	3.99	3.48	3.60	3.06

Image Editing

Performance comparison on CEdit-Bench and GEdit-Bench:

Model	CEdit‑Bench‑EN↑			CEdit‑Bench‑CN↑			GEdit‑Bench‑EN↑			GEdit‑Bench‑CN↑
Model	G_SC	G_PQ	G_O	G_SC	G_PQ	G_O	G_SC	G_PQ	G_O	G_SC	G_PQ	G_O
FLUX.1 Kontext [Pro]	6.79	7.80	6.53	1.15	8.07	1.43	7.02	7.60	6.56	1.11	7.36	1.23
GPT Image 1 [High]	8.64	8.26	8.17	8.67	8.26	8.21	7.85	7.62	7.53	7.67	7.56	7.30
Nano Banana	7.51	8.17	7.20	7.67	8.21	7.36	7.86	8.33	7.54	7.51	8.31	7.25
Seedream 4.0	8.12	7.95	7.58	8.14	7.95	7.57	8.24	8.08	7.68	8.19	8.14	7.71
FLUX.1 Kontext [Dev]	6.31	7.56	5.93	1.25	7.66	1.51	6.52	7.38	6.00	-	-	-
Step1X‑Edit	6.68	7.36	6.25	6.88	7.28	6.35	7.66	7.35	6.97	7.20	6.87	6.86
Qwen‑Image‑Edit	8.07	7.84	7.52	8.03	7.78	7.46	8.00	7.86	7.56	7.82	7.79	7.52
Qwen‑Image‑Edit [2509]	8.04	7.79	7.48	7.93	7.71	7.37	8.15	7.86	7.54	8.05	7.88	7.49
LongCat‑Image‑Edit	8.27	7.88	7.67	8.25	7.85	7.65	8.18	8.00	7.64	8.08	7.99	7.60

Human Evaluation (Win Rate)

Models	Comprehensive Quality	Consistency
Nano Banana vs LongCat-Image-Edit	60.8% vs 39.2%	53.9% vs 46.1%
Seedream 4.0 vs LongCat-Image-Edit	56.9% vs 43.1%	56.3% vs 43.7%
Qwen-Image-Edit [2509] vs LongCat-Image-Edit	41.3% vs 58.7%	45.8% vs 54.2%
FLUX.1 Kontext [Pro] vs LongCat-Image-Edit	39.5% vs 60.5%	37% vs 63%

Training Pipeline

cd LongCat-Image
# for training, install other requirements
pip install -r train_requirements.txt
python setup.py develop

We provide training code that enables advanced development of our LongCat‑Image‑Dev and model, including SFT, LoRA, DPO, and Image Editing training.

See TRAINING.md for detailed instructions.

Community Works

Community works are welcome! Please PR or inform us in Issue to add your work.

[LoRA Adapters] Fine-tuned models for specific styles and domains
[ComfyUI Integration] Native support for ComfyUI workflow
[Diffusers Pipeline] HuggingFace Diffusers integration
ComfyUI Longcat Image - Custom node extension for ComfyUI workflow.

License Agreement

LongCat-Image is licensed under Apache 2.0. See the LICENSE file for the full license text.

Usage Considerations

This model has not been specifically designed or comprehensively evaluated for every possible downstream application.

Developers should take into account the known limitations of large language models, including performance variations across different languages, and carefully assess accuracy, safety, and fairness before deploying the model in sensitive or high-risk scenarios. It is the responsibility of developers and downstream users to understand and comply with all applicable laws and regulations relevant to their use case, including but not limited to data protection, privacy, and content safety requirements.

Nothing in this Model Card should be interpreted as altering or restricting the terms of the Apache License 2.0 under which the model is released.

Citation

We kindly encourage citation of our work if you find it useful.

@article{LongCat-Image,
      title={LongCat-Image Technical Report},
      author={Meituan LongCat Team and  Hanghang Ma and Haoxian Tan and Jiale Huang and Junqiang Wu and Jun-Yan He and Lishuai Gao and Songlin Xiao and Xiaoming Wei and Xiaoqi Ma and Xunliang Cai and Yayong Guan and Jie Hu},
	    journal={arXiv preprint arXiv:2512.07584},
      year={2025}
}

Acknowledgements

We would like to thank the contributors to the FLUX.1, Qwen2.5-VL, Diffusers and HuggingFace repositories, for their open research.

Contact

Please contact us at longcat-team@meituan.com or join our WeChat Group if you have any questions.

WeChat Group

_{Built with ❤️ by Meituan LongCat Team}

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
assets		assets
longcat_image		longcat_image
misc		misc
scripts		scripts
train_examples		train_examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
infer_requirements.txt		infer_requirements.txt
setup.py		setup.py
train_requirements.txt		train_requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LongCat-Image

Model Introduction

Key Features

News

Showcase

Text-to-Image

Image Editing

Quick Start

Installation

Model Download

Run Text-to-Image Generation

Run Image Editing

Evaluation Results

Text-to-Image Generation

Text Rendering

Human Evaluation (MOS)

Image Editing

Performance comparison on CEdit-Bench and GEdit-Bench:

Human Evaluation (Win Rate)

Training Pipeline

Community Works

License Agreement

Usage Considerations

Citation

Acknowledgements

Contact

WeChat Group

About

Uh oh!

Releases

Packages

Contributors 5

Uh oh!

Languages

License

meituan-longcat/LongCat-Image

Folders and files

Latest commit

History

Repository files navigation

LongCat-Image

Model Introduction

Key Features

News

Showcase

Text-to-Image

Image Editing

Quick Start

Installation

Model Download

Run Text-to-Image Generation

Run Image Editing

Evaluation Results

Text-to-Image Generation

Text Rendering

Human Evaluation (MOS)

Image Editing

Performance comparison on CEdit-Bench and GEdit-Bench:

Human Evaluation (Win Rate)

Training Pipeline

Community Works

License Agreement

Usage Considerations

Citation

Acknowledgements

Contact

WeChat Group

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Uh oh!

Languages

Packages