LLM aided design
An editor has nominated this article for deletion. You are welcome to participate in the deletion discussion, which will decide whether or not to retain it. |
LLM-aided design refers to the use of large language models (LLMs) as smart agents throughout the end-to-end process of system design, including conceptualization, prototyping, verification, and optimization. This evolving interdisciplinary model integrates advances in natural language processing (NLP), program synthesis, and automated reasoning to support tasks in domains such as electronic design automation (EDA), software engineering, hardware design, and cyber-physical systems.
Unlike traditional automation tools, LLMs - especially transformer-based architectures like GPT-4, Claude[1], LLaMA, and domain-specialized variants such as CodeLlama - are capable of interpreting, generating, and refining structured and unstructured data including natural language specifications, HDL (Hardware Description Language)/HDL-like code, constraint definitions, tool scripts, and design documentation. LLM-aided design thus represents a shift from tool-assisted engineering to a form of co-design in which machine intelligence participates actively in architectural exploration, logic synthesis, formal verification, and post-silicon validation. It is situated at the intersection of artificial intelligence, computer-aided design (CAD), and systems engineering.
Introduction
[edit]Engineering workflows in hardware and software development have traditionally relied on manual translation of high-level design intents into machine-readable specifications. These processes, though robust, are time-consuming and often require significant domain expertise. The introduction of large language models into design workflows aims to streamline this process by enabling natural language interaction, synthesis of domain-specific artifacts, and integration with design toolchains.
In recent years, the field of engineering design has witnessed an exponential conjunction of artificial intelligence (AI) and domain-specific modeling. LLMs - such as GPT-4, Claude[1], and LLaMA - are capable of understanding and generating code, documents, and designs from natural language descriptions. This capacity opens a new area where human designers can work together with AI systems to ensure design correctness and reduce time-to-market. The aim is to allow designers to express intent in natural language and rely on the model to output Verilog, VHDL, HLS C, or firmware code.
LLM-aided design differs from earlier forms of automated design through its ability to generalize across tasks and contexts. Unlike rule-based or template-driven systems, large language models can encode domain-specific heuristics and adapt to various inputs—including design specifications, codebases, formal properties, and documentation—without requiring extensive retraining. This flexibility supports their use in diverse design settings such as system-on-chip development, embedded systems, robotic control, and cyber-physical system modeling.
A new epistemic layer is added to the engineering process by LLM-aided design, in which models contribute towards design reasoning rather than only carrying out commands. This allows use for flow control automation, formal assertion generation, and template retrieval for HLS code repair. Additionally, it gave rise to domain-adapted LLMs known as circuit foundation models (CFMs), which are capable of reasoning and generating across the whole RTL-to-GDSII pipeline.
Background and Foundations of LLM-Aided Design
[edit]The integration of large language models (LLMs) into electronic design automation (EDA) represents a shift in how hardware systems are specified, verified, and developed. While EDA has conventionally been defined by predefined workflows, rule-based synthesis tools, and extensive manual intervention, the growth of LLMs has introduced a new design angle driven by reasoning, abstraction, and human-language interaction. This shift aligns with the broader trajectory of artificial intelligence, where general-purpose models have increasingly been specialized for domain-specific tasks, including those that traditionally needed expert engineers.
From Transformers to Circuit Reasoning
[edit]The transformer architecture introduced by Vaswani et al. (2017)[2] serves as the foundation of LLM-aided design. This architecture replaced RNNs and LSTMs[3] in natural language processing due to its ability to simulate long-range dependencies with self-attention mechanisms. It serves as the basis for the GPT series, beginning with GPT-2 all the way to GPT-4o and more, with each iteration having significantly better capabilities in zero-shot reasoning, code generation, and language understanding.
By 2020, GPT-3's ability to produce functional code - including basic HTML, Python, and even Verilog-had drawn the interest of the AI community. This inspired hardware design researchers to speculate that LLMs could be used for logic design and verification activities by taking advantage of the structural similarities between programming languages and hardware description languages (HDLs). Early experiments using GPT-3 to write Verilog or assist in debugging demonstrated potential but also had critical limitations like poor syntax, hallucinations, and incompatibility with synthesis tools.
The attempt to address these limitations led to the exploration of a new direction - the creation of domain-specific foundation models tailored to EDA. These models - referred to as circuit foundation models — are trained or fine-tuned on HDL codes, simulation traces, synthesis logs, and constraint files. By 2023, tools like RTLLM[4] began to deliver results with the vision of LLM-aided design through carefully engineered prompts, feedback loops, and domain-aligned datasets.
Year | Milestone |
---|---|
2017 | Transformer introduced by Vaswani et al. [2] |
2020 | GPT-3 exhibits rudimentary HDL generation capability. |
2021 | Prompt-based Verilog code generation appears in exploratory tools. |
2022 | RTLLM[4] pioneer structured, feedback-driven generation pipelines. |
2023 | Domain-specific finetuning (VeriGen[5], RTLCoder[6]); agent frameworks RTLFixer[7], MEIC[8] become practical. |
2024 | Vision-language fusion (LayoutCopilot[9]), analog LLMs (LaMAGIC[10], AnalogCoder[11]) expand to new design domains. |
2025 | Multi-agent architectures and graph-text fusion (DRC-Coder[12]) reshape design verification. |
Decoder vs. Encoder Models in Co-Design
[edit]1. Decoder-Based Autoregressive Models: Based on architectures like GPT and CodeLlama, these models are used for generation tasks. They can translate natural language specifications into HDL, generate testbenches, and repair buggy RTL. Prompt chaining and few-shot learning are a few of many ways to make these models effective in synthesis-aligned code generation.
2. Encoder-Based Graph Reasoning Models: Inspired by models such as BERT and adapted into graph neural networks (e.g., ChipFormer[13]), these models are optimized for inference tasks over structural representations like netlists or IRs. They can estimate timing, identify bottlenecks, and do logic equivalence checks.
The design ecosystem is increasingly adapting hybrid strategies, where decoder models generate artifacts and encoder models verify or optimize them-forming a closed co-design loop. This dual architecture is similar to human design workflows, where generation and validation are heavily co-dependent.
Methodological Landscape of LLM-Aided Design
[edit]LLM-aided design covers multiple stages of the hardware-software co-design pipeline, including natural language specification, HDL synthesis, analog circuit design, formal verification, and layout generation. While foundational techniques such as prompting, supervised fine-tuning (SFT), and retrieval-augmented generation (RAG) cover much of the field, their practical application is widespread based on the nature of the task. To provide a comprehensive view, the following summary table classifies typical LLM methodologies by their corresponding EDA task domain for a few recently published domain-specific representative LLMs/Tools:
Representative LLMs/Tools | LLM Methodology Used | Task Domain |
---|---|---|
RTLLM[4], VeriGen[5], RTLFixer[7] | Prompt engineering, self-refinement, score-based SFT | Specification to HDL |
ChatEDA[14] | Instruction tuning, retrieval-augmented generation | Constraint Generation |
AutoSVA[15], LLM4DV[16] | Coverage-driven generation | Testbench & Assertions |
LayoutCopilot[9], ChatEDA[14] | Vision-Language models, TCL script generation | Floorplan/Layout Synthesis |
AnalogCoder[11], LaMAGIC[10] | Topology suggestion, layout constraints, Bayesian tuning | Analog Circuit Synthesis |
Core Methodologies
[edit]Below are a few core methodologies, with insights from recent tools and frameworks:
Specification to HDL Translation
[edit]LLMs can generate synthesizable RTL (Verilog, VHDL) directly from natural language specifications. This process is significantly enhanced using:
- Prompt engineering and hierarchical prompting, for structured code generation,
- Context window expansion, to provide multi-level module and signal context,
- Self-refinement and feedback from compiler logs, allowing the LLM to repair and converge to synthesizable HDL,
- Score-based supervised fine-tuning (SFT), as seen in tools like RTLLM[4], VeriGen[5], and RTLFixer[7], to improve alignment with design and functional correctness.
Testbench and Assertion Generation
[edit]LLMs synthesize SystemVerilog assertions, property checks, and full test environments using examples and coverage goals. Verification environments, SystemVerilog assertions (SVA), and test stimuli can be automatically synthesized using:
- Coverage-driven generation, where LLMs aim to satisfy specific coverage goals and random seed diversity,
- Tools such as AutoSVA[15] and LLM4DV[16] have shown higher assertion coverage and better bug exposure than traditional constrained-random verification methods.
HDL Debugging and Repair
[edit]Using templates, similarity search, and error log analysis, LLMs can auto-repair syntax and functional bugs. LLMs assist in both syntactic repair (fixing compilation errors) and semantic repair (correcting logical/functional behavior), leveraging:
- Template libraries and error log parsing,
- Similarity search from past fixes,
- Retrieval-Augmented Generation (RAG) pipelines such as RTLFixer[7] and MEIC[8], which iteratively improve code until it passes lint, synthesis, or formal checks.
HLS Code Refinement
[edit]Standard C/C++ is often incompatible with HLS constraints (e.g., recursion, pointers). LLMs identify and rewrite such constructs by:
- Detecting and rewriting non-HLS-friendly patterns using prompt-repair pipelines,
- Generating test harnesses and compiler hints (e.g., `#pragma HLS unroll`),
- Tools like GPT4AIGChip[17] convert ML kernels into synthesizable HLS by combining structural abstraction and loop pattern rewrites.
Constraint Generation
[edit]Constraint files are essential for synthesis, placement, and timing correctness. LLMs like ChatEDA support this through:
- Instruction tuning, enabling fine-grained command generation (e.g., for SDC, XDC formats),
- Retrieval-Augmented Generation (RAG), which pulls prior constraints from similar designs or databases to ensure domain-consistent generation,
- Generating multi-domain timing, placement, and IO constraints with contextual accuracy.
Floorplan and Layout Synthesis
[edit]Physical design requires careful placement and routing. LLM-vision hybrid models such as LayoutCopilot[9] and ChatEDA[14] employ:
- Vision-language modeling to interpret and manipulate layout imagery (DEF/GDSII),
- TCL script generation, customized for tools like Innovus and ICC2,
- Automatic power grid and macro placement proposals, based on learned design intents.
Analog Circuit Synthesis
[edit]Analog design poses unique challenges due to its sensitivity and lack of digital abstraction. Tools like AnalogCoder[11] and LaMAGIC[10] use:
- Topology suggestion via LLMs, based on specification matching (gain, slew, bandwidth),
- Layout constraint prediction, such as symmetry, matching, and parasitic awareness,
- Bayesian optimization and tuning, informed by LLM predictions for transistor sizing and performance trade-offs.
These methodologies collectively depict LLMs as design agents capable of integrating with CAD flows, reasoning over heterogeneous inputs (text, code, specs, layout), and adapting to domain-specific constraints. As tools mature, the distinction between synthesis, verification, and optimization continues to blur—paving the way for closed-loop, autonomous hardware design.
Among these, HDL generation has emerged as one of the most deeply investigated tasks in LLM-aided EDA research, serving as a methodological testbed for broader design automation challenges. It captures the full interplay between natural language, symbolic code, feedback refinement, and tool integration. The following case study synthesizes key techniques employed in HDL generation workflows.
Methodological Classification of HDL Generation: A Case Study
[edit]The following table, constructed using detailed insights from recent papers, including the 2025 survey by Pan et al., highlights the methodologies underlying LLM-aided HDL generation
Project Name | Model Used | Approach Type | Summary |
---|---|---|---|
RTLLM[4] | GPT-3.5 | Prompt Engineering | Multi-step planning-based prompt design with syntax and functional log feedback. |
Chip-Chat[18] | ChatGPT-4 | Conversational Co-design | Full pipeline HDL synthesis guided via interactive dialogue with GPT-4. |
VeriGen[5] | CodeGen-16B | Fine-tuning | Trained on textbook + GitHub Verilog, improved synthesis-valid output, syntax robustness. |
ChatEDA[14] | LLaMA-20B | QLoRA + Instruction Tuning | Trained on GPT-4-generated EDA instructions; interprets and executes user commands. |
RTLCoder[6] | Mistral-7B | Scored SFT | Uses synthesis scores to steer SFT toward functionally valid and resource-efficient HDL |
BetterV[19] | CodeLlama + TinyLlama | Controlled Gen + SFT | Bayesian discriminator modifies token probability for valid HDL output |
RTLFixer[7] | GPT-4 | RAG + Agent Framework | Uses ReAct prompting and error categorization DBs for debug-oriented HDL refinement. |
These methods highlight key trends and research frontiers:
- Prompting + Logs: RTLLM[4] is an example of tools that show that prompting alone, when combined with feedback from toolchains, is sufficient for competitive HDL generation without model retraining.
- Fine-tuning on RTL: VeriGen[5] and RTLCoder[6] show that focused fine-tuning, especially with quality metrics (e.g., synthesis logs, functional correctness), significantly improves output robustness.
- Controlled Generation: BetterV[19] uses probabilistic controls in token sampling, pushing Verilog generation beyond maximum-likelihood decoding.
- Agent Architectures: RTLFixer[7] embodies an emerging paradigm where LLMs serve not just as code generators, but as self-refining agents—reading logs, tracing waveforms, and performing symbolic analysis.
The table also highlights the significance of multi-agent collaboration, retrieval-augmented generation (RAG), and tool-in-the-loop frameworks, which move beyond simple completion tasks into autonomous reasoning and repair. The performance advantages of fine-tuned and multi-modal frameworks over traditional prompting, as shown in benchmarks like VerilogEval[20] and PyHDL-Eval[21], confirm that tightly integrated model-tool co-evolution is needed for true engineering-grade HDL generation.
Datasets and Evaluation Infrastructure
[edit]Large language models in EDA are developed, tuned, and evaluated using robust datasets. These datasets come in a range of formats, from performance metrics and natural language requirements to tokenized Verilog corpora and annotated tool logs. They make it possible for supervised fine-tuning, domain adaptation, and benchmarking for synthesis validity and generation quality.
In addition to increasing dataset volume, recent initiatives have improved granularity and diversity. Instruction-tuned datasets like ChatEDA[14] teach LLMs how to interact with toolchains; benchmark sets such as VerilogEval[20] assess model output quality; and design-level corpora like RTLCoder[6] and MG-Verilog offer structural annotations and synthesis metadata. Human-annotated multilingual Verilog pairs that facilitate abstraction and cross-language translation are provided by the MG-Verilog. The VeriGen[5] dataset uses textbook-derived Verilog tasks to facilitate fundamental pedagogical finetuning.
Tooling and Infrastructure: Practical Deployments
[edit]Several practical tools now demonstrate that LLM-aided design is no longer theoretical:
- ChatEDA[14] : Serves as a natural language interface for controlling Vivado, Quartus, or Innovus workflows. It interprets user intent and translates it into tool-specific commands.
- RTLLM[4]-Editor: An IDE that integrates real-time HDL generation, compilation feedback, and syntax repair.
- LLM4DV[16] and AutoSVA[15]: Specialized for formal verification, these tools generate SystemVerilog assertions and support coverage-driven testbench synthesis.
These tools reflect an operational maturity and are being integrated into prototyping, verification, closure, and constraint generation workflows.
See Also
[edit]- Large language model
- natural language processing
- Electronic system-level design and verification
- Systems design
- hardware design
- Embedded system
- Design space exploration
- GPT-4
- Llama (language model)
- Hardware description language
- Formal verification
- artificial intelligence
- computer-aided design
- Verilog
- VHDL
- System on a chip
- Retrieval-augmented generation
- SystemVerilog
- Transformer (deep learning architecture)
- Fine-tuning (deep learning)
- Register-transfer level
- Design space exploration
References
[edit]- ^ a b Anthropic et al. The Claude 3 Model Family: Opus, Sonnet, Haiku. Anthropic Model Card (PDF), 2024. Available online
- ^ a b Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N.; Kaiser, Lukasz; and Polosukhin, Illia. Attention Is All You Need. *Proceedings of the 31st International Conference on Neural Information Processing Systems* (NIPS'17), 6000–6010. Curran Associates Inc. ISBN 9781510860964. Available online
- ^ Sherstinsky, Alex. Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network. Physica D: Nonlinear Phenomena, vol. 404, 2020, p. 132306. Available online
- ^ a b c d e f g Lu, Yao; Liu, Shang; Zhang, Qijun; and Xie, Zhiyao. RTLLM: An Open-Source Benchmark for Design RTL Generation with Large Language Model. *Proceedings of the 29th Asia and South Pacific Design Automation Conference* (ASPDAC '24), 722–727. IEEE Press, 2024. Available online
- ^ a b c d e f Thakur, Shailja; Ahmad, Baleegh; Pearce, Hammond; Tan, Benjamin; Dolan-Gavitt, Brendan; Karri, Ramesh; and Garg, Siddharth. VeriGen: A Large Language Model for Verilog Code Generation. ACM Transactions on Design Automation of Electronic Systems, vol. 29, no. 3, article 46, 2024, pp. 1–31. Available online
- ^ a b c d Liu, Shang; et al. RTLCoder: Fully Open-Source and Efficient LLM-Assisted RTL Code Generation Technique. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 44, no. 4, pp. 1448–1461, April 2025. IEEE. Available online
- ^ a b c d e f Tsai, Yunda; Liu, Mingjie; and Ren, Haoxing. RTLFixer: Automatically Fixing RTL Syntax Errors with Large Language Model. *Proceedings of the 61st ACM/IEEE Design Automation Conference* (DAC '24), Article 53, 6 pages. Association for Computing Machinery, 2024. Available online
- ^ a b Xu, Ke; Sun, Jialin; Hu, Yuchen; Fang, Xinwei; Shan, Weiwei; Wang, Xi; and Jiang, Zhe. MEIC: Re-thinking RTL Debug Automation using LLMs. *Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design* (ICCAD'25), Article 100, 9 pages. Association for Computing Machinery, 2025. Available online
- ^ a b c Liu, B.; et al. LayoutCopilot: An LLM-Powered Multi-Agent Collaborative Framework for Interactive Analog Layout Design. *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, 2025. IEEE. Available online
- ^ a b c Chang, Chen-Chia; Shen, Yikang; Fan, Shaoze; Li, Jing; Zhang, Shun; Cao, Ningyuan; Chen, Yiran; and Zhang, Xin. LaMAGIC: Language-Model-Based Topology Generation for Analog Integrated Circuits. *Proceedings of the 41st International Conference on Machine Learning* (ICML '24), Article 241, 10 pages. JMLR.org, 2024. Available online
- ^ a b c Lai, Yao; Lee, Sungyoung; Chen, Guojin; Poddar, Souradip; Hu, Mengkang; Pan, David Z.; and Luo, Ping. AnalogCoder: Analog Circuit Design via Training-Free Code Generation. *Proceedings of the AAAI Conference on Artificial Intelligence*, vol. 39, no. 1, pp. 379–387, 2025. Available online
- ^ Chang, Chen-Chia; Ho, Chia-Tung; Li, Yaguang; Chen, Yiran; and Ren, Haoxing. DRC-Coder: Automated DRC Checker Code Generation Using LLM Autonomous Agent. In: Proceedings of the 2025 International Symposium on Physical Design (ISPD ’25), ACM, 2025, pp. 143–151. Available online
- ^ Lai, Yao; Liu, Jinxin; Tang, Zhentao; Wang, Bin; Hao, Jianye; and Luo, Ping. ChiPFormer: Transferable Chip Placement via Offline Decision Transformer. *Proceedings of the 40th International Conference on Machine Learning* (ICML '23), Article 757, 19 pages. JMLR.org, 2023. Available online
- ^ a b c d e f Wu, Haoyuan; He, Zhuolun; Zhang, Xinyun; Yao, Xufeng; Zheng, Su; Zheng, Haisheng; and Yu, Bei. ChatEDA: A Large Language Model Powered Autonomous Agent for EDA. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 43, no. 10, 2024, pp. 3184–3197. Available online
- ^ a b c Orenes-Vera, Marcelo; Manocha, Aninda; Wentzlaff, David; and Martonosi, Margaret. AutoSVA: Democratizing Formal Verification of RTL Module Interactions. *Proceedings of the 58th Annual ACM/IEEE Design Automation Conference* (DAC '21), pp. 535–540. IEEE Press, 2022. Available online
- ^ a b c Zhang, Zixi; Chadwick, Greg; McNally, Hugo; Zhao, Yiren; and Mullins, Robert. LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation. *Proceedings of the 33rd IEEE International Symposium on Field-Programmable Custom Computing Machines* (FCCM '25), pp. 1–5, 2025. Available online
- ^ Fu, Y.; et al. GPT4AIGChip: Towards Next-Generation AI Accelerator Design Automation via Large Language Models. *Proceedings of the 2023 IEEE/ACM International Conference on Computer Aided Design* (ICCAD '23), San Francisco, CA, USA, pp. 1–9. IEEE, 2023. Available online
- ^ Blocklove, Jason; Garg, Siddharth; Karri, Ramesh; and Pearce, Hammond. Chip-Chat: Challenges and Opportunities in Conversational Hardware Design. In: Proceedings of the 2023 ACM/IEEE 5th Workshop on Machine Learning for CAD (MLCAD), IEEE, Sept. 2023, pp. 1–6. Available online
- ^ a b Pei, Zehua; Zhen, Hui-Ling; Yuan, Mingxuan; Huang, Yu; and Yu, Bei. BetterV: Controlled Verilog Generation with Discriminative Guidance. *Proceedings of the 41st International Conference on Machine Learning* (ICML '24), Article 1628, 9 pages. JMLR.org, 2024. Available online
- ^ a b Pinckney, Nathaniel; Batten, Christopher; Liu, Mingjie; Ren, Haoxing; and Khailany, Brucek. Revisiting VerilogEval: A Year of Improvements in Large-Language Models for Hardware Code Generation. *ACM Transactions on Design Automation of Electronic Systems* (TODAES), Association for Computing Machinery, February 2025. Available online
- ^ Batten, Christopher; Pinckney, Nathaniel; Liu, Mingjie; Ren, Haoxing; and Khailany, Brucek. PyHDL-Eval: An LLM Evaluation Framework for Hardware Design Using Python-Embedded DSLs. In: Proceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD (MLCAD '24), ACM, 2024, article 10, pp. 1–17. Available online