Math-VR & CodePlot-CoT

Mathematical Visual Reasoning by Thinking with Code-Driven Images

1HKU   2Meituan   3CUHK
*Equal Contribution   Project Lead   ✉️Corresponding Author
Teaser Image for Math-VR

We introduce Math-VR, the first large-scale bilingual dataset and Benchmark for mathematical visual reasoning, and CodePlot-CoT, a novel code-driven Chain-of-Thought paradigm that enables models to "think with images" by generating executable plotting code.

Abstract

Recent advances in Vision Language Models (VLMs) have shown significant progress in mathematical reasoning, yet they still face a critical bottleneck with problems that require visual assistance, such as drawing auxiliary lines or plotting functions. Most VLMs are constrained to text-only reasoning, while unified models that generate interleaved text and images often lack the precision required for mathematical tasks. We present CodePlot-CoT, a code-driven Chain-of-Thought (CoT) paradigm that enables models to "think with images" in mathematics. Our approach leverages a VLM to generate both textual reasoning and executable plotting code. This code is then rendered into an image, serving as a "visual thought" that is reinput into the model to aid in problem solving. To facilitate this, we introduce Math-VR, the first large-scale, bilingual dataset and benchmark for mathematical problems requiring visual reasoning, comprising 178K samples. We also developed MatplotCode, a specialized image-to-code converter to generate high-quality training data. We benchmark SOTA models on our Math-VR. Our experiments show that CodePlot-CoT achieves up to a 21% performance increase over its base model, demonstrating the effectiveness of our code-driven reasoning paradigm.

Contributions

  • Math-VR: The first large-scale, bilingual (English and Chinese) dataset and benchmark (178K samples) for mathematical problems with visual reasoning.
  • CodePlot-CoT: A novel and efficient paradigm that enables VLMs to engage in visual reasoning through code generation.
  • MatplotCode: A state-of-the-art image-to-code converter for mathematical figures, achieving 100% code execution success rate and high reconstruction fidelity.
  • Strong Empirical Results: CodePlot-CoT achieves up to a 21% performance increase over strong baselines on the Math-VR benchmark.

Math-VR Benchmark Results

Benchmark on 2,500 English questions (1,000 Text + 1,500 Multimodal). Metrics: Process Score (PS) and Answer Correctness (AC).

# Model Link Version #Params Type Thinking Overall (AC) Overall (PS) Text (AC) Text (PS) Multimodal (AC) Multimodal (PS)
-Qwen3-VL-235B-A22B-ThinkingLink-235BVLM66.881.058.977.472.183.4
-Qwen3-VL-235B-A22B-InstructLink-235BVLMX65.080.159.477.868.881.6
-Gemini-2.5-ProLink--VLM64.780.858.777.968.782.8
-Gemini-2.5-FlashLink2025-06-17-VLM60.578.457.077.562.979.0
-GPT-o3Link2025-04-16-VLM59.376.452.972.963.778.6
-Seed-1.6-ThinkingLink2025-06-15-VLM58.475.253.073.062.076.6
-Nano BananaLink2025-08-26-UMX53.473.849.172.356.374.7
-Gemini-2.5-Flash-No-ThinkingLink2025-06-17-VLMX52.373.744.670.957.575.5
-GLM-4.5VLink-108BVLM49.669.748.070.550.669.1
-Mimo-VL-7B-RLLink25087BVLM48.368.843.568.451.369.0
-InternVL-3.5-8BLink-8BVLM40.862.838.564.042.262.0
-GPT-4.1-miniLink--VLMX33.360.033.362.033.358.6
-GLM-4.1V-9BLink-9BVLM29.053.427.854.429.952.7
-Claude-Sonnet-4Link2025-05-23-VLMX28.156.431.560.925.853.4
-GPT-4.1Link--VLMX26.053.926.656.525.652.2
-CodePlot-CoTLink-32BVLMX22.147.031.653.815.842.4
-Gemini-2.0-FlashLink--VLMX20.650.724.156.118.347.0
-Keye-VL-1.5Link-8BVLMX17.338.220.244.415.434.0
-Gemma3Link-27BVLMX16.144.819.250.814.140.8
-Qwen-2.5-VL-72BLink-72BVLMX13.740.815.344.612.738.2
-Bagel-Zebra-CoTLink-7BUMX10.134.113.941.57.629.1
-Qwen-2.5-VL-32BLink-32BVLMX10.033.710.636.99.631.5
-GPT-4.1-nanoLink--VLMX9.138.513.145.96.433.6
-InternVL-3.5-8B-No-ThinkingLink-8BVLMX7.931.49.235.67.028.6
-BagelLink-7BUMX7.627.68.532.97.024.0
-Qwen-2.5-VL-3BLink-3BVLMX5.327.57.933.43.623.6
-GPT-4oLink2024-11-20-VLMX4.330.45.734.63.427.6
-Qwen-2.5-VL-7BLink-7BVLMX3.013.84.518.02.011.0
#ModelLinkVersion#ParamsTypeThinking Text (AC)Text (PS)
-Deepseek-R1Link-671BLLM49.569.9

Visualization

🚨🚨🚨 Note! The data here is heavily compressed for easier visualization.


BibTeX


        @article{duan2025code,
          title={CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images},
          author={Duan, Chengqi and Fang, Rongyao and Wang, Yuqing and Wang, Kun and Huang, Linjiang and Zeng, Xingyu and Li, Hongsheng and Liu, Xihui},
          journal={arXiv preprint arXiv:2510.11718},
          year={2025}
        }