Reliable Reasoning in SVG-LLMs via Multi-Task Multi-Reward Reinforcement Learning

📚 Introduction

Official repository of "Reliable Reasoning in SVG-LLMs via Multi-Task Multi-Reward Reinforcement Learning". In this work, we present CTRL-S (Chain-of-Thought Reinforcement Learning for SVG), a unified framework that introduces a chain-of-thought mechanism to explicitly expose the model’s reasoning process during SVG generation. To support this structured reasoning, we construct SVG-Sophia, a high-quality dataset of 145K samples across SVG code refinement, Text-to-SVG, and Image-to-SVG tasks. Furthermore, we design a robust multi-reward reinforcement learning scheme powered by the GRPO algorithm. By jointly optimizing across DINO, image-text similarity, format, and code efficiency rewards in a multi-task setting, our approach systematically boosts structural coherence and generation capabilities. Extensive experiments show that CTRL-S outperforms existing methods, achieving higher task success rates, superior code quality, and exceptional visual fidelity.

📢 News

[2026-03-18] 🎉 SVG-Sophia is now available on HuggingFace! 🤗Dataset
[2026-03-18] 👋 Upload paper and init project. Read

📝 Open-Source Plan

🔬 Framework

The overall pipeline of CTRL-S. (1) Two-Stage SFT: The model is first trained on 1M SAgoge samples to align SVG-specific tokens, and then fine-tuned on SVG-Sophia to learn CoT-structured responses with explicit step-wise planning. (2) Multi-Task Multi-Reward RL: We jointly optimize Text-to-SVG, Image-to-SVG, and SVG refinement tasks via a multi-reward mechanism, including Format Reward, DINO Reward, Image-text Similarity Reward, and Code Efficiency Reward, to improve structural validity, visual fidelity, semantic alignment, and concise code generation.

📌 Quick Start

⚙️ Installation

git clone https://github.com/hmwang2002/CTRL-S.git
cd CTRL-S

conda create -n ctrls python=3.12 -y
conda activate ctrls
pip install -r requirements.txt

For training, CTRL-S uses LLaMA-Factory for supervised fine-tuning (SFT) and verl for reinforcement learning (RL). Please refer to their official installation guides to prepare the corresponding environments.

🧩 SVG-Sophia Dataset

The SVG-Sophia dataset is available at Hugging Face.

After downloading and extraction, the files are organized as follows:

File	Description
`cot_img2svg_sft.jsonl`	CoT training data for the SFT stage — Image-to-SVG task
`cot_text2svg_sft.jsonl`	CoT training data for the SFT stage — Text-to-SVG task
`cot_refinement_sft.jsonl`	CoT training data for the SFT stage — SVG code refinement task
`cot_img2svg_rl.jsonl`	CoT training data for the RL stage — Image-to-SVG task
`cot_text2svg_rl.jsonl`	CoT training data for the RL stage — Text-to-SVG task
`cot_refinement_rl.jsonl`	CoT training data for the RL stage — SVG code refinement task
`cot_refinement_test.jsonl`	Test set for the SVG code refinement task

In summary, files with the _sft suffix are used for SFT-stage training, files with the _rl suffix are used for RL-stage training, and cot_refinement_test.jsonl is the held-out test set for the SVG code refinement task.

Training

We will open-source the training scripts and the implementation of reward functions as soon as possible.

Deployment

We provide a sample deployment script at scripts/deploy/deploy.sh.

#!/bin/bash

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 vllm serve PATH_TO_MODEL \
  --served-model-name "MODEL_NAME" \
  --dtype bfloat16 \
  --tensor-parallel-size 8 \
  --mm-encoder-tp-mode data \
  --async-scheduling \
  --max-num-seqs 32 \
  --max-num-batched-tokens 16384 \
  --max-model-len 128000 \
  --gpu-memory-utilization 0.72 \
  --media-io-kwargs '{"video": {"num_frames": -1}}' \
  --host 0.0.0.0 \
  --port 22002 \
  --trust-remote-code \
  --no-enable-prefix-caching \
  --no-enable-expert-parallel \
  --enable-multimodal

Evaluation

We support evaluation on two benchmarks:

SArena: Download the benchmark from InternSVG.
SVG-Sophia refinement test set: Use cot_refinement_test.jsonl from the SVG-Sophia dataset.

After downloading the data, simply modify the demo scripts under scripts/inference/ and scripts/evaluate/ to set the correct file paths and URLs, then run them to perform inference and evaluation respectively:

Script	Purpose
`scripts/inference/gen.sh`	Inference for Text-to-SVG / Image-to-SVG
`scripts/inference/refine.sh`	Inference for SVG code refinement
`scripts/evaluate/gen.sh`	Evaluation for Text-to-SVG / Image-to-SVG
`scripts/evaluate/refine.sh`	Evaluation for SVG code refinement

License

CTRL-S is licensed under the Apache License 2.0.

📖 Citation

@article{wang2026reliable,
  title={Reliable Reasoning in SVG-LLMs via Multi-Task Multi-Reward Reinforcement Learning},
  author={Wang, Haomin and Wei, Qi and Ma, Qianli and Ding, Shengyuan and Yin, Jinhui and Chen, Kai and Zhang, Hongjie},
  journal={arXiv preprint arXiv:2603.16189},
  year={2026}
}

@inproceedings{wang2025internsvg,
    author = "Haomin Wang and Jinhui Yin and Qi Wei and Wenguang Zeng and Lixin Gu and Shenglong Ye and Zhangwei Gao and Yaohui Wang and Yanting Zhang and Yuanqi Li and Yanwen Guo and Wenhai Wang and Kai Chen and Yu Qiao and Hongjie Zhang",
    title = "Internsvg: Towards unified svg tasks with multimodal large language models",
    booktitle={The Fourteenth International Conference on Learning Representations},
    year={2026},
    url={https://openreview.net/forum?id=YxqnNNs3sf}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
example		example
metrics		metrics
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
evaluate_gen.py		evaluate_gen.py
evaluate_refinement.py		evaluate_refinement.py
install.sh		install.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reliable Reasoning in SVG-LLMs via Multi-Task Multi-Reward Reinforcement Learning

📚 Introduction

📢 News

📝 Open-Source Plan

🔬 Framework

📌 Quick Start

⚙️ Installation

🧩 SVG-Sophia Dataset

Training

Deployment

Evaluation

License

📖 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Reliable Reasoning in SVG-LLMs via Multi-Task Multi-Reward Reinforcement Learning

📚 Introduction

📢 News

📝 Open-Source Plan

🔬 Framework

📌 Quick Start

⚙️ Installation

🧩 SVG-Sophia Dataset

Training

Deployment

Evaluation

License

📖 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages