MathCanvas

Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning

MathCanvas-Bench · MathCanvas-Instruct · MathCanvas-Edit · MathCanvas-Imagen ·

Weikang Shi^1*, Aldrich Yu^1*, Rongyao Fang^1*†, Houxing Ren¹, Ke Wang¹, Aojun Zhou¹, Changyao Tian¹,
Xinyu Fu², Yuxuan Hu¹, Zimu Lu¹, Linjiang Huang³, Si Liu³, Rui Liu^2‡, Hongsheng Li^1‡

¹MMLab, CUHK ²Huawei Research ³BUAA
^*Equal Contribution ^†Project Lead ^‡Corresponding Author

💥 News

[2025-11-15] Our MathCanvas-Instruct dataset--219k math problems with interleaved visual-text reasoning is now accessible at Huggingface.
[2025-10-30] 🚀 We are excited to announce that MathCanvas-Bench is now officially supported by VLMEvalKit! This allows for easy evaluation on over 220+ LMMs. For usage instructions, please refer to this PR.
[2025-10-28] The data generation code for the Foundational Structure Generation part of MathCanvas-Edit is now available. Refer to the Data Generation section for usage instructions.
[2025-10-23] We release the training/inference code of BAGEL-Canvas and evaluation scripts for MathCanvas-Bench.
[2025-10-18] Our model and datasets are now accessible at Huggingface.
[2025-10-18] Our paper is now accessible at ArXiv Paper.

📖 Introduction

🌟 This is the official repository for the paper "MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning". This repository will host the datasets, evaluation code, and models associated with our work.

MathCanvas demonstrates the first successful application of intrinsic Visual Chain-of-Thought (VCoT) for complex mathematical reasoning, outperforming previous attempts.

MathCanvas is a comprehensive framework designed to endow unified Large Multimodal Models (LMMs) with intrinsic Visual Chain-of-Thought (VCoT) capabilities for mathematics. Our approach enables models to strategically generate and reason with visual aids, mirroring how humans solve complex problems in domains like geometry and function analysis.

🚀 Model Training and Inference

For detailed instructions on setting up the environment, training the BAGEL-Canvas model, and running inference, please refer to our comprehensive guide:

📄 Model Usage: The complete guide for model training and inference.

📊 Evaluation

This section provides instructions for evaluating model performance on our MathCanvas-Bench benchmark. The evaluation process relies on an LLM-based judge (GPT-4.1) to assess the correctness of the generated answers.

To evaluate the inference results on MathCanvas-Bench, follow the steps below:

Configure the Evaluation Script: Open the evaluation/mathcanvas_evaluate_4.1.sh script and set the your_api_key and your_base_url variables.
Run Evaluation: Execute the following command, replacing {INFERENCE_DIR} with the path to your inference output.
```
cd MathCanvas/evaluation
bash mathcanvas_evaluate_4.1.sh {INFERENCE_DIR}
```
View the Results: After the script finishes, an evaluation summary will be generated. This summary includes detailed accuracy metrics, such as:
- Weighted scoring accuracy and complete accuracy.
- Accuracy broken down by knowledge category.
- Accuracy based on whether the question includes initial images.

🧬 Data Generation

This section details the process for generating the Foundational Structure Generation subset of the MathCanvas-Edit dataset. Our synthesis pipeline for foundational geometric structures is based on the official implementation of AlphaGeometry.

Environment Setup

Before running the generation script, you must set up the environment required by AlphaGeometry. Please refer to their official repository and follow the installation instructions.

Usage

Once the environment is configured, you can generate the data by running the provided script.

cd foundations_synthesis/
bash foundations_synthesis.sh

You can customize the generation process by modifying the foundations_synthesis.sh script. This includes parameters such as the total number of samples to generate and the length of the editing sequences.

For convenience, we have already generated 1 million editing sequences for this subset. You can directly access and download them from our dataset repository on Hugging Face:

🤗 MathCanvas-Edit Dataset.

✨ Highlights

MathCanvas-Bench & MathCanvas-Instruct

To facilitate rigorous evaluation, we introduce MathCanvas-Bench, a challenging benchmark with 3K problems that require models to produce interleaved visual-textual solutions. The models are fine-tuned on MathCanvas-Instruct, a new 219K-example dataset of interleaved visual-textual reasoning paths, teaching them when and how to leverage visual aids.

Statistical analysis of the MathCanvas-Bench dataset.

Examples from the MathCanvas-Instruct dataset, showing interleaved visual and textual reasoning steps.

MathCanvas-Edit & MathCanvas-Imagen

We constructed a massive 15.2M-pair pre-training corpus to teach foundational visual manipulation skills. This includes MathCanvas-Imagen (10M caption-to-diagram pairs) for mastering diagram generation and MathCanvas-Edit (5.2M step-by-step editing trajectories) for diagram editing.

The curation pipeline for the MathCanvas-Edit and MathCanvas-Imagen datasets.

Examples from the MathCanvas-Edit and MathCanvas-Imagen datasets.

Two-Stage Training Recipe

Our model, BAGEL-Canvas, is trained using a two-stage framework:

Stage I: Mastering Visual Manipulation: The model learns from the 15.2M examples in MathCanvas-Imagen and MathCanvas-Edit to create and edit mathematical diagrams.
Stage II: Developing Strategic Reasoning: The model is then trained on MathCanvas-Instruct to strategically generate visual steps as part of a solution.

The two-stage training framework of MathCanvas.

📝 TODO

Our code and models are currently being prepared for public release. We appreciate your patience!

Release training and inference code for BAGEL-Canvas.
Release evaluation scripts for the MathCanvas-Bench.
Update the evaluation scripts for the MathCanvas-Bench to VLMEvalKit.
Release the data generation code for Foundational Structure Generation in MathCanvas-Edit.

Contact

If you have any questions, please raise an issue or contact us at wkshi@link.cuhk.edu.hk.

📜 Citation

If you find our work useful for your research, please consider citing our paper:

@misc{shi2025mathcanvasintrinsicvisualchainofthought,
      title={MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning}, 
      author={Weikang Shi and Aldrich Yu and Rongyao Fang and Houxing Ren and Ke Wang and Aojun Zhou and Changyao Tian and Xinyu Fu and Yuxuan Hu and Zimu Lu and Linjiang Huang and Si Liu and Rui Liu and Hongsheng Li},
      year={2025},
      eprint={2510.14958},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.14958}, 
}
@inproceedings{
  wang2025mathcodervl,
  title={MathCoder-{VL}: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning},
  author={Ke Wang and Junting Pan and Linda Wei and Aojun Zhou and Weikang Shi and Zimu Lu and Han Xiao and Yunqiao Yang and Houxing Ren and Mingjie Zhan and Hongsheng Li},
  booktitle={The 63rd Annual Meeting of the Association for Computational Linguistics},
  year={2025},
  url={https://openreview.net/forum?id=nuvtX1imAb}
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
BAGEL-Canvas		BAGEL-Canvas
assets		assets
benchmark		benchmark
evaluation		evaluation
foundations_synthesis		foundations_synthesis
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly