Welcome to the Skywork-UniPic repository! This repository contains the model weights and implementation of our unified 1.5B-parameter autoregressive model that natively integrates image understanding, text-to-image generation, and image editing capabilities within a single architecture.
Performance Overview
GenEval
| Model | Single | Two | Count | Color | Position | Attr | Overall |
|---|---|---|---|---|---|---|---|
| Diffusion Models | |||||||
| SDv2.1 | 0.98 | 0.51 | 0.44 | 0.85 | 0.07 | 0.17 | 0.50 |
| SDXL | 0.98 | 0.74 | 0.39 | 0.85 | 0.15 | 0.23 | 0.55 |
| IF‑XL | 0.97 | 0.74 | 0.66 | 0.81 | 0.13 | 0.35 | 0.61 |
| LUMINA‑Next | 0.92 | 0.46 | 0.48 | 0.70 | 0.09 | 0.13 | 0.46 |
| SD3‑medium | 0.99 | 0.94 | 0.72 | 0.89 | 0.33 | 0.60 | 0.74 |
| FLUX.1‑dev | 0.99 | 0.81 | 0.79 | 0.74 | 0.20 | 0.47 | 0.67 |
| NOVA | 0.99 | 0.91 | 0.62 | 0.85 | 0.33 | 0.56 | 0.71 |
| Autoregressive Models | |||||||
| TokenFlow‑XL | 0.95 | 0.60 | 0.41 | 0.81 | 0.16 | 0.24 | 0.55 |
| Janus | 0.97 | 0.68 | 0.30 | 0.84 | 0.46 | 0.42 | 0.61 |
| Janus Pro | 0.99 | 0.89 | 0.59 | 0.90 | 0.79 | 0.66 | 0.80 |
| Emu3‑Gen | 0.99 | 0.81 | 0.42 | 0.80 | 0.49 | 0.45 | 0.66 |
| Show‑o | 0.98 | 0.80 | 0.66 | 0.84 | 0.31 | 0.50 | 0.68 |
| Unified Models | |||||||
| OmniGen | 0.98 | 0.84 | 0.66 | 0.74 | 0.40 | 0.43 | 0.68 |
| OmniGen2 | 1.00 | 0.95 | 0.64 | 0.88 | 0.55 | 0.76 | 0.80 |
| OmniGen2† | 0.99 | 0.96 | 0.74 | 0.98 | 0.71 | 0.75 | 0.86 |
| MetaQuery‑XL† | - | - | - | - | - | - | 0.80 |
| BLIP3‑o† 4B | - | - | - | - | - | - | 0.81 |
| BLIP3‑o† 8B | - | - | - | - | - | - | 0.84 |
| BAGEL | 0.99 | 0.94 | 0.81 | 0.88 | 0.64 | 0.63 | 0.82 |
| BAGEL† | 0.98 | 0.95 | 0.84 | 0.95 | 0.78 | 0.77 | 0.88 |
| UniWorld‑V1 | 0.99 | 0.93 | 0.79 | 0.89 | 0.49 | 0.70 | 0.80 |
| UniWorld‑V1† | 0.98 | 0.93 | 0.81 | 0.89 | 0.74 | 0.71 | 0.84 |
| Ovis‑U1 | 0.98 | 0.98 | 0.90 | 0.92 | 0.79 | 0.75 | 0.89 |
| Proprietary Models | |||||||
| GPT‑4o | 0.99 | 0.92 | 0.85 | 0.92 | 0.75 | 0.61 | 0.84 |
| Skywork UniPic | 0.98 | 0.92 | 0.74 | 0.91 | 0.89 | 0.72 | 0.86 |
DPG‑Bench
| Model | Global | Entity | Attribute | Relation | Other | Overall |
|---|---|---|---|---|---|---|
| Diffusion Models | ||||||
| LUMINA‑Next | 82.82 | 88.65 | 86.44 | 80.53 | 81.82 | 74.63 |
| SDXL | 83.27 | 82.43 | 80.91 | 86.76 | 80.41 | 74.65 |
| PlayGroundv2.5 | 83.06 | 82.59 | 81.20 | 84.08 | 83.50 | 75.47 |
| Hunyuan‑DiT | 84.59 | 80.59 | 88.01 | 74.36 | 86.41 | 78.87 |
| PixArt‑Σ | 86.89 | 82.89 | 88.94 | 86.59 | 87.68 | 80.54 |
| DALLE3 | 90.97 | 89.61 | 88.39 | 90.58 | 89.83 | 83.50 |
| SD3‑medium | 87.90 | 91.01 | 88.83 | 80.70 | 88.68 | 84.08 |
| FLUX.1‑dev | 82.10 | 89.50 | 88.70 | 91.10 | 89.40 | 84.00 |
| Autoregressive Models | ||||||
| Show‑o | 79.33 | 75.44 | 78.02 | 84.45 | 60.80 | 67.27 |
| EMU3 | 85.21 | 86.68 | 86.84 | 90.22 | 83.15 | 80.60 |
| TokenFlow‑XL | 78.72 | 79.22 | 81.29 | 85.22 | 71.20 | 73.38 |
| Janus | 82.33 | 87.38 | 87.70 | 85.46 | 86.41 | 79.68 |
| Janus Pro | 86.90 | 88.90 | 89.40 | 89.32 | 89.48 | 84.19 |
| BLIP3‑o 4B | - | - | - | - | - | 79.36 |
| BLIP3‑o 8B | - | - | - | - | - | 81.60 |
| Unified Models | ||||||
| OmniGen | 87.90 | 88.97 | 88.47 | 87.95 | 83.56 | 81.16 |
| OmniGen2 | 88.81 | 88.83 | 90.18 | 89.37 | 90.27 | 83.57 |
| BAGEL | 88.94 | 90.37 | 91.29 | 90.82 | 88.67 | 85.07 |
| UniWorld‑V1 | 83.64 | 88.39 | 88.44 | 89.27 | 87.22 | 81.38 |
| Ovis‑U1 | 82.37 | 90.08 | 88.68 | 93.35 | 85.20 | 83.72 |
| Skywork UniPic | 89.65 | 87.78 | 90.84 | 91.89 | 91.95 | 85.50 |
GEdit‑Bench‑EN
| Model | SC ↑ | PQ ↑ | Overall ↑ |
|---|---|---|---|
| Proprietary Models | |||
| Gemini‑2.0‑flash | 6.73 | 6.61 | 6.32 |
| GPT‑4o | 7.85 | 7.62 | 7.53 |
| Specialized Editing Models | |||
| Instruct‑Pix2Pix | 3.58 | 5.49 | 3.68 |
| MagicBrush | 4.68 | 5.66 | 4.52 |
| AnyEdit | 3.18 | 5.82 | 3.21 |
| ICEdit | 5.11 | 6.85 | 4.84 |
| Step1X‑Edit | 7.09 | 6.76 | 6.70 |
| Unified Models | |||
| OmniGen | 5.96 | 5.89 | 5.06 |
| OmniGen2 | 7.16 | 6.77 | 6.41 |
| BAGEL | 7.36 | 6.83 | 6.52 |
| UniWorld‑V1 | 4.93 | 7.43 | 4.85 |
| Ovis‑U1 | - | - | 6.42 |
| Skywork UniPic | 6.72 | 6.18 | 5.83 |
Create virtual environment and install dependencies with pip:
conda create -n unipic python==3.10.14
conda activate unipic
pip install -r requirements.txtDownload the model checkpoints from [🤗 SkyworkUniPic], It is recommended to use the following command to download the checkpoints
# pip install -U "huggingface_hub[cli]"
huggingface-cli download Skywork/Skywork-UniPic-1.5B --local-dir checkpoint --repo-type modelexport PYTHONPATH=./:$PYTHONPATH
python scripts/image_edit.py configs/models/qwen2_5_1_5b_kl16_mar_h.py \
--checkpoint checkpoint/pytorch_model.bin --image_size 1024 \
--image data/sample.png --prompt "Replace the stars with the candle." \
--output output.jpg You can generate images from text prompts using the following command:
export PYTHONPATH=./:$PYTHONPATH
python scripts/text2image.py configs/models/qwen2_5_1_5b_kl16_mar_h.py \
--checkpoint checkpoint/pytorch_model.bin --image_size 1024 \
--prompt 'A glossy-coated golden retriever stands on the park lawn beside a life-sized penguin statue.' --output output.jpgTo generate a list of images based on prompts in a json file.
export PYTHONPATH=./:$PYTHONPATH
accelerate launch scripts/batch_text2image.py configs/models/qwen2_5_1_5b_kl16_mar_h.py \
--checkpoint checkpoint/pytorch_model.bin --image_size 1024 \
--data data/batch_t2i.json --output output --batch_size 2 --grid_size 2The json file should look like:
[
{
"prompt": "A glossy-coated golden retriever stands on the park lawn beside a life-sized penguin statue."
},
{
"prompt": "Digital portrait of a girl with rainbow hair."
}
]export PYTHONPATH=./:$PYTHONPATH
python scripts/image2text.py configs/models/qwen2_5_1_5b_kl16_mar_h.py \
--checkpoint checkpoint/pytorch_model.bin --image_size 1024 \
--image data/sample.png --prompt "Describe the image in detail."This project is licensed under MIT License.