Official repository of "Reliable Reasoning in SVG-LLMs via Multi-Task Multi-Reward Reinforcement Learning". In this work, we present CTRL-S (Chain-of-Thought Reinforcement Learning for SVG), a unified framework that introduces a chain-of-thought mechanism to explicitly expose the model’s reasoning process during SVG generation. To support this structured reasoning, we construct SVG-Sophia, a high-quality dataset of 145K samples across SVG code refinement, Text-to-SVG, and Image-to-SVG tasks. Furthermore, we design a robust multi-reward reinforcement learning scheme powered by the GRPO algorithm. By jointly optimizing across DINO, image-text similarity, format, and code efficiency rewards in a multi-task setting, our approach systematically boosts structural coherence and generation capabilities. Extensive experiments show that CTRL-S outperforms existing methods, achieving higher task success rates, superior code quality, and exceptional visual fidelity.
- [2026-03-18] 🎉 SVG-Sophia is now available on HuggingFace! 🤗Dataset
- [2026-03-18] 👋 Upload paper and init project. Read
- Training scripts
- Model weights
- Evaluation code
- SVG-Sophia dataset
- Paper
The overall pipeline of CTRL-S. (1) Two-Stage SFT: The model is first trained on 1M SAgoge samples to align SVG-specific tokens, and then fine-tuned on SVG-Sophia to learn CoT-structured responses with explicit step-wise planning. (2) Multi-Task Multi-Reward RL: We jointly optimize Text-to-SVG, Image-to-SVG, and SVG refinement tasks via a multi-reward mechanism, including Format Reward, DINO Reward, Image-text Similarity Reward, and Code Efficiency Reward, to improve structural validity, visual fidelity, semantic alignment, and concise code generation.
git clone https://github.com/hmwang2002/CTRL-S.git
cd CTRL-S
conda create -n ctrls python=3.12 -y
conda activate ctrls
pip install -r requirements.txtFor training, CTRL-S uses LLaMA-Factory for supervised fine-tuning (SFT) and verl for reinforcement learning (RL). Please refer to their official installation guides to prepare the corresponding environments.
The SVG-Sophia dataset is available at Hugging Face.
After downloading and extraction, the files are organized as follows:
| File | Description |
|---|---|
cot_img2svg_sft.jsonl |
CoT training data for the SFT stage — Image-to-SVG task |
cot_text2svg_sft.jsonl |
CoT training data for the SFT stage — Text-to-SVG task |
cot_refinement_sft.jsonl |
CoT training data for the SFT stage — SVG code refinement task |
cot_img2svg_rl.jsonl |
CoT training data for the RL stage — Image-to-SVG task |
cot_text2svg_rl.jsonl |
CoT training data for the RL stage — Text-to-SVG task |
cot_refinement_rl.jsonl |
CoT training data for the RL stage — SVG code refinement task |
cot_refinement_test.jsonl |
Test set for the SVG code refinement task |
In summary, files with the _sft suffix are used for SFT-stage training, files with the _rl suffix are used for RL-stage training, and cot_refinement_test.jsonl is the held-out test set for the SVG code refinement task.
We will open-source the training scripts and the implementation of reward functions as soon as possible.
We provide a sample deployment script at scripts/deploy/deploy.sh.
#!/bin/bash
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 vllm serve PATH_TO_MODEL \
--served-model-name "MODEL_NAME" \
--dtype bfloat16 \
--tensor-parallel-size 8 \
--mm-encoder-tp-mode data \
--async-scheduling \
--max-num-seqs 32 \
--max-num-batched-tokens 16384 \
--max-model-len 128000 \
--gpu-memory-utilization 0.72 \
--media-io-kwargs '{"video": {"num_frames": -1}}' \
--host 0.0.0.0 \
--port 22002 \
--trust-remote-code \
--no-enable-prefix-caching \
--no-enable-expert-parallel \
--enable-multimodal We support evaluation on two benchmarks:
- SArena: Download the benchmark from InternSVG.
- SVG-Sophia refinement test set: Use
cot_refinement_test.jsonlfrom the SVG-Sophia dataset.
After downloading the data, simply modify the demo scripts under scripts/inference/ and scripts/evaluate/ to set the correct file paths and URLs, then run them to perform inference and evaluation respectively:
| Script | Purpose |
|---|---|
scripts/inference/gen.sh |
Inference for Text-to-SVG / Image-to-SVG |
scripts/inference/refine.sh |
Inference for SVG code refinement |
scripts/evaluate/gen.sh |
Evaluation for Text-to-SVG / Image-to-SVG |
scripts/evaluate/refine.sh |
Evaluation for SVG code refinement |
CTRL-S is licensed under the Apache License 2.0.
@article{wang2026reliable,
title={Reliable Reasoning in SVG-LLMs via Multi-Task Multi-Reward Reinforcement Learning},
author={Wang, Haomin and Wei, Qi and Ma, Qianli and Ding, Shengyuan and Yin, Jinhui and Chen, Kai and Zhang, Hongjie},
journal={arXiv preprint arXiv:2603.16189},
year={2026}
}
@inproceedings{wang2025internsvg,
author = "Haomin Wang and Jinhui Yin and Qi Wei and Wenguang Zeng and Lixin Gu and Shenglong Ye and Zhangwei Gao and Yaohui Wang and Yanting Zhang and Yuanqi Li and Yanwen Guo and Wenhai Wang and Kai Chen and Yu Qiao and Hongjie Zhang",
title = "Internsvg: Towards unified svg tasks with multimodal large language models",
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=YxqnNNs3sf}
}