AMSbench: A Comprehensive Benchmark for Evaluating MLLM Capabilities in AMS Circuits

AMSbench is a comprehensive benchmark suite designed to systematically evaluate the capabilities of Multi-modal Large Language Models (MLLMs) across critical challenges in Analog/Mixed-Signal (AMS) circuit perception, analysis, and design.

📄 Paper | 🤗 Dataset | 💻 Code | 🌐 Project Page

📖 Abstract

Analog/Mixed-Signal (AMS) circuits play a critical role in the integrated circuit (IC) industry. However, automating AMS circuit design has remained a longstanding challenge due to its difficulty and complexity. Recent advances in Multi-modal Large Language Models (MLLMs) offer promising potential for supporting AMS circuit analysis and design. However, current research typically evaluates MLLMs on isolated tasks within the domain, lacking a comprehensive benchmark that systematically assesses model capabilities across diverse AMS-related challenges.

To address this gap, we introduce AMSbench. Our benchmark comprises approximately 8,000 test questions spanning multiple difficulty levels and assesses eight prominent models, including both open-source and proprietary solutions like GPT-4o and Gemini-2.5 Pro. Our evaluation highlights significant limitations in current MLLMs, particularly in complex multi-modal reasoning and sophisticated circuit design tasks. These results underscore the necessity of advancing MLLMs’ understanding and effective application of circuit-specific knowledge, thereby narrowing the existing performance gap relative to human expertise and moving toward fully automated AMS circuit design workflows.

💡 Why AMSbench? The Challenge of AMS Circuits

Analog/Mixed-Signal (AMS) circuits are the bedrock of modern electronic systems, but their design process is highly dependent on the experience and intuition of engineers. Unlike natural images or plain text, circuit diagrams present unique challenges for MLLMs:

Structured Knowledge: Circuit diagrams are highly structured, containing components, connection relationships (topology), and hierarchical designs.
Symbolic Language: They use a standardized symbolic language that requires precise identification and understanding.
Implicit Functionality: The function of a circuit depends not only on its individual components but on the complex interactions between them.

While current MLLMs excel in general domains, they struggle to comprehend this specialized graphical language without targeted training. Therefore, a standard benchmark is crucial to measure their capabilities in this specialized field and to guide future research.

🛠️ Benchmark Construction Process

To ensure the benchmark's quality and comprehensiveness, we adopted a rigorous data construction pipeline:

Data Collection: We extensively gathered various circuit diagrams and related information from authoritative academic textbooks, cutting-edge research papers, and industrial product datasheets.
Data Processing: We utilized advanced tools like MinerU and AMSnet to parse PDF documents, extract circuit diagrams, and generate initial netlists.
Data Annotation and Generation:
- Expert Annotation: Domain experts were invited to perform detailed manual annotations of the circuit diagrams.
- MLLM-Assisted Generation: We leveraged MLLMs (like GPT-4V) to generate preliminary "circuit-caption" data pairs.
- Quality Control: All MLLM-generated content was rigorously reviewed and corrected by our expert team to ensure data accuracy and professional quality.
Tiered Question Design: We stratified questions into three difficulty levels—Easy, Medium, and Hard—to comprehensively evaluate model capabilities, from basic cognition to deep reasoning.

🎯 Core Capabilities & Tasks

AMSbench is designed around three core capability dimensions, encompassing approximately 8,000 test questions.

1. AMS-Perception

Objective: To evaluate the fundamental visual understanding of circuit diagrams by MLLMs.
Task Examples:
- Identifying and locating specific components (e.g., MOSFETs, resistors, capacitors).
- Extracting the complete circuit connectivity (netlist extraction).
- Recognizing basic circuit topologies.
Question Count: ~6,000

2. AMS-Analysis

Objective: To evaluate the deep understanding and reasoning capabilities of MLLMs regarding circuit functionality and performance.
Task Examples:
- Analyzing the core function of a circuit (e.g., is this an amplifier or a comparator?).
- Understanding how changes in component parameters affect circuit performance.
- Explaining the performance trade-offs.
Question Count: ~2,000

3. AMS-Design

Objective: To evaluate the potential of MLLMs in automated circuit design workflows.
Task Examples:
- Generating a circuit schematic based on given performance specifications (e.g., gain, bandwidth).
- Creating a valid simulation testbench for a given circuit.
- Fixing or optimizing design flaws in an existing circuit.
Question Count: 68

🔬 Model Evaluation & Key Findings

We conducted a comprehensive evaluation of 8 leading MLLMs. The main findings reveal the current state and future challenges for MLLMs in the AMS domain.

Widespread Limitations: All existing models exhibit certain limitations when handling complex AMS circuit tasks.
Adequate Perception Capability: Most models can perform basic component identification tasks reasonably well, but struggle with extracting complete and accurate netlists.
Deficiencies in Analysis and Design: Performance drops significantly in analysis and design tasks that require deep reasoning. Models find it difficult to fully comprehend performance trade-offs and are unable to generate valid testbenches.
A Clear Path Forward: The evaluation results clearly identify the shortcomings of current MLLMs, providing a distinct direction for future model optimization and algorithm research.

✨ Task Examples

Here are some examples from the benchmark, illustrating the diversity of tasks.

Perception Task	Partition Task

Includes total/type-wise counting, location description, connection judgment, and topology generation.	The model must identify all constituent structures in a complex circuit, such as a differential pair and Miller compensation.
Reasoning Task	Function Task

Based on the circuit diagram, the model must explain why it functions as an operational amplifier.	The model must determine the intended function of the given circuit diagram.

🚀 Getting Started

Download the Dataset: Access the full dataset from our Hugging Face Repository.
Explore the Data: The dataset is organized by task (Perception, Analysis, Design) and difficulty. Each entry contains a circuit image and associated questions/answers.
Evaluate Your Model: Use the provided data to test the performance of your own MLLMs. We encourage you to follow our evaluation setup for comparable results.
Share Your Findings: We welcome contributions and comparisons. If you have a new model, test it on AMSbench and share your results with the community!

✍️ Citation

If you use AMSbench in your research, please cite our paper:

@misc{shi2025amsbenchcomprehensivebenchmarkevaluating,
      title={AMSbench: A Comprehensive Benchmark for Evaluating MLLM Capabilities in AMS Circuits}, 
      author={Yichen Shi and Ze Zhang and Hongyang Wang and Zhuofu Tao and Zhongyi Li and Bingyu Chen and Yaxin Wang and Zhiping Yu and Ting-Jung Lin and Lei He},
      year={2025},
      eprint={2505.24138},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={[https://arxiv.org/abs/2505.24138](https://arxiv.org/abs/2505.24138)}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
CKT_design.py		CKT_design.py
README.md		README.md
environment.yml		environment.yml
prompt4Testbench		prompt4Testbench
prompt_template.md		prompt_template.md
prompt_template_complex.md		prompt_template_complex.md
tb_api_test.py		tb_api_test.py
test models.py		test models.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AMSbench: A Comprehensive Benchmark for Evaluating MLLM Capabilities in AMS Circuits

📖 Abstract

💡 Why AMSbench? The Challenge of AMS Circuits

🛠️ Benchmark Construction Process