Jump to content

LLM aided design: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Removed {{AI generated}} and {{AfC submission}} tags: Accepted ecidotr's assurance of elimination of AI for creation. AFC artefact removed since this is at AFD
Removed/Replaced (with peer-reviewed alternntives) Arxiv links
Line 14: Line 14:
'''LLM-aided design''' refers to the use of [[Large language model|large language models]] (LLMs) as smart agents throughout the end-to-end process of system design, including conceptualization, prototyping, verification, and optimization. This evolving interdisciplinary model integrates advances in [[natural language processing]] (NLP), [[program synthesis]], and automated reasoning to support tasks in domains such as [[electronic design automation]] (EDA), [[software engineering]], [[hardware design]], and [[cyber-physical systems]].
'''LLM-aided design''' refers to the use of [[Large language model|large language models]] (LLMs) as smart agents throughout the end-to-end process of system design, including conceptualization, prototyping, verification, and optimization. This evolving interdisciplinary model integrates advances in [[natural language processing]] (NLP), [[program synthesis]], and automated reasoning to support tasks in domains such as [[electronic design automation]] (EDA), [[software engineering]], [[hardware design]], and [[cyber-physical systems]].


Unlike traditional automation tools, [[Large language model|LLM]]s - especially transformer-based architectures like [[GPT-4]]<ref name="gpt4">OpenAI et al. ''GPT-4 Technical Report''. arXiv:2303.08774 [cs.CL], 2024. [https://arxiv.org/abs/2303.08774 Available online]</ref>, Claude<ref name="claude">Anthropic et al. ''The Claude 3 Model Family: Opus, Sonnet, Haiku''. Anthropic Model Card (PDF), 2024. [https://www-cdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf Available online]</ref>, [[Llama (language model)|LLaMA]]<ref name="llama">Touvron, Hugo et al. ''LLaMA: Open and Efficient Foundation Language Models''. arXiv:2302.13971 [cs.CL], 2023. [https://arxiv.org/abs/2302.13971 Available online]</ref>, and domain-specialized variants such as CodeLlama<ref name="codellama">Rozière, Baptiste et al. ''Code Llama: Open Foundation Models for Code''. arXiv:2308.12950 [cs.CL], 2024. [https://arxiv.org/abs/2308.12950 Available online]</ref> or ChipGPT<ref name="chipgpt">Chang, Kaiyan; Wang, Ying; Ren, Haimeng; Wang, Mengdi; Liang, Shengwen; Han, Yinhe; Li, Huawei; and Li, Xiaowei. ''ChipGPT: How Far Are We from Natural Language Hardware Design''. arXiv:2305.14019 [cs.AI], 2023. [https://arxiv.org/abs/2305.14019 Available online]</ref> - are capable of interpreting, generating, and refining structured and unstructured data including natural language specifications, HDL ([[Hardware description language|Hardware Description Language]])/HDL-like code, constraint definitions, tool scripts, and design documentation. LLM-aided design thus represents a shift from tool-assisted engineering to a form of co-design in which machine intelligence participates actively in architectural exploration, [[logic synthesis]], [[Formal verification|formal verification]], and post-silicon validation. It is situated at the intersection of [[artificial intelligence]], [[computer-aided design]] (CAD), and [[systems engineering]].
Unlike traditional automation tools, [[Large language model|LLM]]s - especially transformer-based architectures like [[GPT-4]], Claude<ref name="claude">Anthropic et al. ''The Claude 3 Model Family: Opus, Sonnet, Haiku''. Anthropic Model Card (PDF), 2024. [https://www-cdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf Available online]</ref>, [[Llama (language model)|LLaMA]], and domain-specialized variants such as [https://ollama.com/library/codellama CodeLlama] - are capable of interpreting, generating, and refining structured and unstructured data including natural language specifications, HDL ([[Hardware description language|Hardware Description Language]])/HDL-like code, constraint definitions, tool scripts, and design documentation. LLM-aided design thus represents a shift from tool-assisted engineering to a form of co-design in which machine intelligence participates actively in architectural exploration, [[logic synthesis]], [[Formal verification|formal verification]], and post-silicon validation. It is situated at the intersection of [[artificial intelligence]], [[computer-aided design]] (CAD), and [[systems engineering]].


== Introduction ==
== Introduction ==
Line 20: Line 20:
Engineering workflows in hardware and software development have traditionally relied on manual translation of high-level design intents into machine-readable specifications. These processes, though robust, are time-consuming and often require significant domain expertise. The introduction of large language models into design workflows aims to streamline this process by enabling natural language interaction, synthesis of domain-specific artifacts, and integration with design toolchains.
Engineering workflows in hardware and software development have traditionally relied on manual translation of high-level design intents into machine-readable specifications. These processes, though robust, are time-consuming and often require significant domain expertise. The introduction of large language models into design workflows aims to streamline this process by enabling natural language interaction, synthesis of domain-specific artifacts, and integration with design toolchains.


In recent years, the field of engineering design has witnessed an exponential conjunction of [[Artificial intelligence|artificial intelligence]] (AI) and domain-specific modeling. LLMs - such as [[GPT-4]] <ref name="gpt4"/>, [[Claude (language model)|Claude]]<ref name ="claude" />, and [[Llama (language model)|LLaMA]]<ref name="llama" /> - are capable of understanding and generating code, documents, and designs from natural language descriptions. This capacity opens a new area where human designers can work together with [[Artificial intelligence|AI]] systems to ensure design correctness and reduce time-to-market. The aim is to allow designers to express intent in natural language and rely on the model to output [[Verilog]], [[VHDL]], HLS C, or firmware code.
In recent years, the field of engineering design has witnessed an exponential conjunction of [[Artificial intelligence|artificial intelligence]] (AI) and domain-specific modeling. LLMs - such as [[GPT-4]], [[Claude (language model)|Claude]]<ref name ="claude" />, and [[Llama (language model)|LLaMA]] - are capable of understanding and generating code, documents, and designs from natural language descriptions. This capacity opens a new area where human designers can work together with [[Artificial intelligence|AI]] systems to ensure design correctness and reduce time-to-market. The aim is to allow designers to express intent in natural language and rely on the model to output [[Verilog]], [[VHDL]], HLS C, or firmware code.




Line 33: Line 33:
===From Transformers to Circuit Reasoning===
===From Transformers to Circuit Reasoning===


The [[Transformer (deep learning architecture)|transformer]] architecture introduced by Vaswani et al. (2017)<ref name="attention">Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N.; Kaiser, Lukasz; and Polosukhin, Illia. ''Attention Is All You Need''. arXiv:1706.03762 [cs.CL], 2023. [https://arxiv.org/abs/1706.03762 Available online]</ref> serves as the foundation of LLM-aided design. This architecture replaced [[Recurrent neural network|RNNs]] and [[Long short-term memory|LSTMs]]<ref name="sherstinsky2020">Sherstinsky, Alex. ''Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network''. ''Physica D: Nonlinear Phenomena'', vol. 404, 2020, p. 132306. [https://doi.org/10.1016/j.physd.2019.132306 Available online]</ref> in [[natural language processing]] due to its ability to simulate long-range dependencies with self-attention mechanisms. It serves as the basis for the GPT series, beginning with [[GPT-2]] all the way to [[GPT-4o]] and more, with each iteration having significantly better capabilities in zero-shot reasoning, code generation, and language understanding.
The [[Transformer (deep learning architecture)|transformer]] architecture introduced by Vaswani et al. (2017)<ref name="attention">Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N.; Kaiser, Lukasz; and Polosukhin, Illia. ''Attention Is All You Need''. *Proceedings of the 31st International Conference on Neural Information Processing Systems* (NIPS'17), 6000–6010. Curran Associates Inc. ISBN 9781510860964. [https://arxiv.org/abs/1706.03762 Available online]</ref> serves as the foundation of LLM-aided design. This architecture replaced [[Recurrent neural network|RNNs]] and [[Long short-term memory|LSTMs]]<ref name="sherstinsky2020">Sherstinsky, Alex. ''Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network''. ''Physica D: Nonlinear Phenomena'', vol. 404, 2020, p. 132306. [https://doi.org/10.1016/j.physd.2019.132306 Available online]</ref> in [[natural language processing]] due to its ability to simulate long-range dependencies with self-attention mechanisms. It serves as the basis for the GPT series, beginning with [[GPT-2]] all the way to [[GPT-4o]] and more, with each iteration having significantly better capabilities in zero-shot reasoning, code generation, and language understanding.


By 2020, [[GPT-3]]'s ability to produce functional code - including basic [[HTML]], [[Python (programming language)|Python]], and even [[Verilog]]-had drawn the interest of the AI community. This inspired hardware design researchers to speculate that [[Large language model|LLM]]s could be used for [[Logic synthesis|logic design]] and verification activities by taking advantage of the structural similarities between programming languages and [[Hardware description language|hardware description languages]] (HDLs). Early experiments using Codex<ref name="chen2021">Chen, Mark et al. ''Evaluating Large Language Models Trained on Code''. arXiv:2107.03374 [cs.LG], 2021. [https://arxiv.org/abs/2107.03374 Available online]</ref> and [[GPT-3]] to write [[Verilog]] or assist in [[debugging]] demonstrated potential but also had critical limitations like poor [[Syntax (programming languages)|syntax]], hallucinations, and incompatibility with synthesis tools.
By 2020, [[GPT-3]]'s ability to produce functional code - including basic [[HTML]], [[Python (programming language)|Python]], and even [[Verilog]]-had drawn the interest of the AI community. This inspired hardware design researchers to speculate that [[Large language model|LLM]]s could be used for [[Logic synthesis|logic design]] and verification activities by taking advantage of the structural similarities between programming languages and [[Hardware description language|hardware description languages]] (HDLs). Early experiments using [[GPT-3]] to write [[Verilog]] or assist in [[debugging]] demonstrated potential but also had critical limitations like poor [[Syntax (programming languages)|syntax]], hallucinations, and incompatibility with synthesis tools.


The attempt to address these limitations led to the exploration of a new direction - the creation of domain-specific [[foundation models]] tailored to [[Electronic design automation|EDA]]. These models - referred to as circuit foundation models — are trained or fine-tuned on [[Hardware description language|HDL]] codes, simulation traces, synthesis logs, and constraint files. By 2023, tools like RTLLM<ref name="lu2023">Lu, Yao; Liu, Shang; Zhang, Qijun; and Xie, Zhiyao. ''RTLLM: An Open-Source Benchmark for Design RTL Generation with Large Language Model''. arXiv:2308.05345 [cs.LG], 2023. [https://arxiv.org/abs/2308.05345 Available online]</ref> and ChipGPT<ref name="chipgpt"/> began to deliver results with the vision of LLM-aided design through carefully engineered prompts, feedback loops, and domain-aligned datasets.
The attempt to address these limitations led to the exploration of a new direction - the creation of domain-specific [[foundation models]] tailored to [[Electronic design automation|EDA]]. These models - referred to as circuit foundation models — are trained or fine-tuned on [[Hardware description language|HDL]] codes, simulation traces, synthesis logs, and constraint files. By 2023, tools like RTLLM<ref name="lu2023">Lu, Yao; Liu, Shang; Zhang, Qijun; and Xie, Zhiyao. ''RTLLM: An Open-Source Benchmark for Design RTL Generation with Large Language Model''. *Proceedings of the 29th Asia and South Pacific Design Automation Conference* (ASPDAC '24), 722–727. IEEE Press, 2024. [https://doi.org/10.1109/ASP-DAC58780.2024.10473904 Available online]</ref> began to deliver results with the vision of LLM-aided design through carefully engineered prompts, feedback loops, and domain-aligned datasets.


{| class="wikitable sortable"
{| class="wikitable sortable"
Line 50: Line 50:
| 2021 || Prompt-based Verilog code generation appears in exploratory tools.
| 2021 || Prompt-based Verilog code generation appears in exploratory tools.
|-
|-
| 2022|| ChipGPT<ref name="chipgpt"/>, RTLLM<ref name="lu2023"/> pioneer structured, feedback-driven generation pipelines.
| 2022|| RTLLM<ref name="lu2023"/> pioneer structured, feedback-driven generation pipelines.
|-
|-
| 2023 || Domain-specific [[Fine-tuning (deep learning)|finetuning]] (VeriGen<ref name="verigen2024">Thakur, Shailja; Ahmad, Baleegh; Pearce, Hammond; Tan, Benjamin; Dolan-Gavitt, Brendan; Karri, Ramesh; and Garg, Siddharth. ''VeriGen: A Large Language Model for Verilog Code Generation''. ''ACM Transactions on Design Automation of Electronic Systems'', vol. 29, no. 3, article 46, 2024, pp. 1–31. [https://doi.org/10.1145/3643681 Available online]</ref>, RTLCoder<ref name="liu2024">Liu, Shang; Fang, Wenji; Lu, Yao; Wang, Jing; Zhang, Qijun; Zhang, Hongce; and Xie, Zhiyao. ''RTLCoder: Fully Open-Source and Efficient LLM-Assisted RTL Code Generation Technique''. arXiv:2312.08617 [cs.PL], 2024. [https://arxiv.org/abs/2312.08617 Available online]</ref>); agent frameworks RTLFixer<ref name="tsai2024">Tsai, Yun-Da; Liu, Mingjie; and Ren, Haoxing. ''RTLFixer: Automatically Fixing RTL Syntax Errors with Large Language Models''. arXiv:2311.16543 [cs.AR], 2024. [https://arxiv.org/abs/2311.16543 Available online]</ref>, MEIC<ref name="xu2024">Xu, Ke; Sun, Jialin; Hu, Yuchen; Fang, Xinwei; Shan, Weiwei; Wang, Xi; and Jiang, Zhe. ''MEIC: Re-thinking RTL Debug Automation using LLMs''. arXiv:2405.06840 [cs.AR], 2024. [https://arxiv.org/abs/2405.06840 Available online]</ref> become practical.
| 2023 || Domain-specific [[Fine-tuning (deep learning)|finetuning]] (VeriGen<ref name="verigen2024">Thakur, Shailja; Ahmad, Baleegh; Pearce, Hammond; Tan, Benjamin; Dolan-Gavitt, Brendan; Karri, Ramesh; and Garg, Siddharth. ''VeriGen: A Large Language Model for Verilog Code Generation''. ''ACM Transactions on Design Automation of Electronic Systems'', vol. 29, no. 3, article 46, 2024, pp. 1–31. [https://doi.org/10.1145/3643681 Available online]</ref>, RTLCoder<ref name="liu2024">Liu, Shang; et al. ''RTLCoder: Fully Open-Source and Efficient LLM-Assisted RTL Code Generation Technique''. ''IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems'', vol. 44, no. 4, pp. 1448–1461, April 2025. IEEE. [https://doi.org/10.1109/TCAD.2024.3483089 Available online]</ref>); agent frameworks RTLFixer<ref name="tsai2024">Tsai, Yunda; Liu, Mingjie; and Ren, Haoxing. ''RTLFixer: Automatically Fixing RTL Syntax Errors with Large Language Model''. *Proceedings of the 61st ACM/IEEE Design Automation Conference* (DAC '24), Article 53, 6 pages. Association for Computing Machinery, 2024. [https://doi.org/10.1145/3649329.3657353 Available online]</ref>, MEIC<ref name="xu2024">Xu, Ke; Sun, Jialin; Hu, Yuchen; Fang, Xinwei; Shan, Weiwei; Wang, Xi; and Jiang, Zhe. ''MEIC: Re-thinking RTL Debug Automation using LLMs''. *Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design* (ICCAD'25), Article 100, 9 pages. Association for Computing Machinery, 2025. [https://doi.org/10.1145/3676536.3676801 Available online]</ref> become practical.
|-
|-
| 2024 || Vision-language fusion (LayoutCopilot<ref name="layoutcopilot2025">Liu, Bingyang; Zhang, Haoyi; Gao, Xiaohan; Kong, Zichen; Tang, Xiyuan; Lin, Yibo; Wang, Runsheng; and Huang, Ru. ''LayoutCopilot: An LLM-powered Multi-agent Collaborative Framework for Interactive Analog Layout Design''. arXiv:2406.18873 [cs.AR], 2025. [https://arxiv.org/abs/2406.18873 Available online]</ref>), analog LLMs (LaMAGIC<ref name="chang2024">Chang, Chen-Chia; Shen, Yikang; Fan, Shaoze; Li, Jing; Zhang, Shun; Cao, Ningyuan; Chen, Yiran; and Zhang, Xin. ''LaMAGIC: Language-Model-based Topology Generation for Analog Integrated Circuits''. arXiv:2407.18269 [cs.AR], 2024. [https://arxiv.org/abs/2407.18269 Available online]</ref>, AnalogCoder<ref name="lai2024">Lai, Yao; Lee, Sungyoung; Chen, Guojin; Poddar, Souradip; Hu, Mengkang; Pan, David Z.; and Luo, Ping. ''AnalogCoder: Analog Circuit Design via Training-Free Code Generation''. arXiv:2405.14918 [cs.LG], 2024. [https://arxiv.org/abs/2405.14918 Available online]</ref>) expand to new design domains.
| 2024 || Vision-language fusion (LayoutCopilot<ref name="layoutcopilot2025">Liu, B.; et al. ''LayoutCopilot: An LLM-Powered Multi-Agent Collaborative Framework for Interactive Analog Layout Design''. *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, 2025. IEEE. [https://doi.org/10.1109/TCAD.2025.3529805 Available online]</ref>), analog LLMs (LaMAGIC<ref name="chang2024">Chang, Chen-Chia; Shen, Yikang; Fan, Shaoze; Li, Jing; Zhang, Shun; Cao, Ningyuan; Chen, Yiran; and Zhang, Xin. ''LaMAGIC: Language-Model-Based Topology Generation for Analog Integrated Circuits''. *Proceedings of the 41st International Conference on Machine Learning* (ICML '24), Article 241, 10 pages. JMLR.org, 2024. [https://proceedings.mlr.press/v202/chang24a.html Available online]</ref>, AnalogCoder<ref name="lai2024">Lai, Yao; Lee, Sungyoung; Chen, Guojin; Poddar, Souradip; Hu, Mengkang; Pan, David Z.; and Luo, Ping. ''AnalogCoder: Analog Circuit Design via Training-Free Code Generation''. *Proceedings of the AAAI Conference on Artificial Intelligence*, vol. 39, no. 1, pp. 379–387, 2025. [https://doi.org/10.1609/aaai.v39i1.32016 Available online]</ref>) expand to new design domains.
|-
|-
|2025 || Multi-agent architectures and graph-text fusion (DRC-Coder<ref name="chang2025">Chang, Chen-Chia; Ho, Chia-Tung; Li, Yaguang; Chen, Yiran; and Ren, Haoxing. ''DRC-Coder: Automated DRC Checker Code Generation Using LLM Autonomous Agent''. In: ''Proceedings of the 2025 International Symposium on Physical Design (ISPD ’25)'', ACM, 2025, pp. 143–151. [https://doi.org/10.1145/3698364.3705347 Available online]</ref>, VerilogCoder<ref name="ho2025">Ho, Chia-Tung; Ren, Haoxing; and Khailany, Brucek. ''VerilogCoder: Autonomous Verilog Coding Agents with Graph-based Planning and Abstract Syntax Tree (AST)-based Waveform Tracing Tool''. arXiv:2408.08927 [cs.AI], 2025. [https://arxiv.org/abs/2408.08927 Available online]</ref>) reshape design verification.
|2025 || Multi-agent architectures and graph-text fusion (DRC-Coder<ref name="chang2025">Chang, Chen-Chia; Ho, Chia-Tung; Li, Yaguang; Chen, Yiran; and Ren, Haoxing. ''DRC-Coder: Automated DRC Checker Code Generation Using LLM Autonomous Agent''. In: ''Proceedings of the 2025 International Symposium on Physical Design (ISPD ’25)'', ACM, 2025, pp. 143–151. [https://doi.org/10.1145/3698364.3705347 Available online]</ref>) reshape design verification.
|}
|}


===Decoder vs. Encoder Models in Co-Design===
===Decoder vs. Encoder Models in Co-Design===


1. '''Decoder-Based Autoregressive Models''': Based on architectures like [[GPT-1|GPT]]<ref name="gpt4"/> and CodeLlama<ref name="codellama" />, these models are used for generation tasks. They can translate natural language specifications into [[Hardware description language|HDL]], generate testbenches, and repair buggy RTL. Prompt chaining and few-shot learning are a few of many ways to make these models effective in synthesis-aligned code generation.
1. '''Decoder-Based Autoregressive Models''': Based on architectures like [[GPT-1|GPT]] and [https://ollama.com/library/codellama CodeLlama], these models are used for generation tasks. They can translate natural language specifications into [[Hardware description language|HDL]], generate testbenches, and repair buggy RTL. Prompt chaining and few-shot learning are a few of many ways to make these models effective in synthesis-aligned code generation.


2. '''Encoder-Based Graph Reasoning Models''': Inspired by models such as [[BERT (language model)|BERT]] and adapted into [[Graph neural network|graph neural networks]] (e.g., ChipFormer<ref name="lai2023chipformer">Lai, Yao; Liu, Jinxin; Tang, Zhentao; Wang, Bin; Hao, Jianye; and Luo, Ping. ''ChiPFormer: Transferable Chip Placement via Offline Decision Transformer''. arXiv:2306.14744 [cs.LG], 2023. [https://arxiv.org/abs/2306.14744 Available online]</ref>), these models are optimized for inference tasks over structural representations like [[Netlist|netlists]] or IRs. They can estimate timing, identify bottlenecks, and do logic equivalence checks.
2. '''Encoder-Based Graph Reasoning Models''': Inspired by models such as [[BERT (language model)|BERT]] and adapted into [[Graph neural network|graph neural networks]] (e.g., ChipFormer<ref name="ChiPFormer">Lai, Yao; Liu, Jinxin; Tang, Zhentao; Wang, Bin; Hao, Jianye; and Luo, Ping. ''ChiPFormer: Transferable Chip Placement via Offline Decision Transformer''. *Proceedings of the 40th International Conference on Machine Learning* (ICML '23), Article 757, 19 pages. JMLR.org, 2023. [https://proceedings.mlr.press/v202/lai23a.html Available online]</ref>), these models are optimized for inference tasks over structural representations like [[Netlist|netlists]] or IRs. They can estimate timing, identify bottlenecks, and do logic equivalence checks.


The design ecosystem is increasingly adapting hybrid strategies, where decoder models generate artifacts and encoder models verify or optimize them-forming a closed co-design loop. This dual architecture is similar to human design workflows, where generation and validation are heavily co-dependent.
The design ecosystem is increasingly adapting hybrid strategies, where decoder models generate artifacts and encoder models verify or optimize them-forming a closed co-design loop. This dual architecture is similar to human design workflows, where generation and validation are heavily co-dependent.
Line 78: Line 78:
| RTLLM<ref name="lu2023"/>, VeriGen<ref name="verigen2024"/>, RTLFixer<ref name="tsai2024"/>|| [[Prompt engineering]], self-refinement, score-based SFT || Specification to HDL
| RTLLM<ref name="lu2023"/>, VeriGen<ref name="verigen2024"/>, RTLFixer<ref name="tsai2024"/>|| [[Prompt engineering]], self-refinement, score-based SFT || Specification to HDL
|-
|-
| ChatEDA<ref name="wu2024">Wu, Haoyuan; He, Zhuolun; Zhang, Xinyun; Yao, Xufeng; Zheng, Su; Zheng, Haisheng; and Yu, Bei. ''ChatEDA: A Large Language Model Powered Autonomous Agent for EDA''. ''IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems'', vol. 43, no. 10, 2024, pp. 3184–3197. [https://doi.org/10.1109/TCAD.2024.3383347 Available online]</ref>, ChipNeMo<ref name="liu2024chipnemo">Liu, Mingjie et al. ''ChipNeMo: Domain-Adapted LLMs for Chip Design''. arXiv:2311.00176 [cs.CL], 2024. [https://arxiv.org/abs/2311.00176 Available online]</ref> || Instruction tuning, [[retrieval-augmented generation]] || Constraint Generation
| ChatEDA<ref name="wu2024">Wu, Haoyuan; He, Zhuolun; Zhang, Xinyun; Yao, Xufeng; Zheng, Su; Zheng, Haisheng; and Yu, Bei. ''ChatEDA: A Large Language Model Powered Autonomous Agent for EDA''. ''IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems'', vol. 43, no. 10, 2024, pp. 3184–3197. [https://doi.org/10.1109/TCAD.2024.3383347 Available online]</ref> || Instruction tuning, [[retrieval-augmented generation]] || Constraint Generation
|-
|-
| AutoSVA<ref name="orenesvera2021">Orenes-Vera, Marcelo; Manocha, Aninda; Wentzlaff, David; and Martonosi, Margaret. ''AutoSVA: Democratizing Formal Verification of RTL Module Interactions''. arXiv:2104.04003 [cs.AR], 2021. [https://arxiv.org/abs/2104.04003 Available online]</ref>, LLM4DV<ref name="zhang2025">Zhang, Zixi; Szekely, Balint; Gimenes, Pedro; Chadwick, Greg; McNally, Hugo; Cheng, Jianyi; Mullins, Robert; and Zhao, Yiren. ''LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation''. arXiv:2310.04535 [cs.LG], 2025. [https://arxiv.org/abs/2310.04535 Available online]</ref> || Coverage-driven generation || Testbench & Assertions
| AutoSVA<ref name="orenesvera2021">Orenes-Vera, Marcelo; Manocha, Aninda; Wentzlaff, David; and Martonosi, Margaret. ''AutoSVA: Democratizing Formal Verification of RTL Module Interactions''. *Proceedings of the 58th Annual ACM/IEEE Design Automation Conference* (DAC '21), pp. 535–540. IEEE Press, 2022. [https://doi.org/10.1109/DAC18074.2021.9586118 Available online]</ref>, LLM4DV<ref name="zhang2025">Zhang, Zixi; Chadwick, Greg; McNally, Hugo; Zhao, Yiren; and Mullins, Robert. ''LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation''. *Proceedings of the 33rd IEEE International Symposium on Field-Programmable Custom Computing Machines* (FCCM '25), pp. 1–5, 2025. [https://arxiv.org/abs/2310.04535 Available online]</ref> || Coverage-driven generation || Testbench & Assertions
|-
|-
| LayoutCopilot<ref name="layoutcopilot2025"/>, ChatEDA<ref name="wu2024"/>|| Vision-Language models, TCL script generation || Floorplan/Layout Synthesis
| LayoutCopilot<ref name="layoutcopilot2025"/>, ChatEDA<ref name="wu2024"/>|| Vision-Language models, TCL script generation || Floorplan/Layout Synthesis
|-
|-
| AnalogCoder<ref name="lai2024"/>, LaMAGIC<ref name="chang2024"/>, LLANA<ref name="amaduzzi2024">Amaduzzi, Andrea; Zama Ramirez, Pierluigi; Lisanti, Giuseppe; Salti, Samuele; and Di Stefano, Luigi. ''LLaNA: Large Language and NeRF Assistant''. arXiv:2406.11840 [cs.CV], 2024. [https://arxiv.org/abs/2406.11840 Available online]</ref> || Topology suggestion, layout constraints, Bayesian tuning || Analog Circuit Synthesis
| AnalogCoder<ref name="lai2024"/>, LaMAGIC<ref name="chang2024"/>|| Topology suggestion, layout constraints, Bayesian tuning || Analog Circuit Synthesis
|}
|}


Line 117: Line 117:
* Detecting and rewriting non-HLS-friendly patterns using prompt-repair pipelines,
* Detecting and rewriting non-HLS-friendly patterns using prompt-repair pipelines,
* Generating test harnesses and compiler hints (e.g., `#pragma HLS unroll`),
* Generating test harnesses and compiler hints (e.g., `#pragma HLS unroll`),
* Tools like GPT4AIGChip<ref name="fu2025">Fu, Yonggan; Zhang, Yongan; Yu, Zhongzhi; Li, Sixu; Ye, Zhifan; Li, Chaojian; Wan, Cheng; and Lin, Yingyan Celine. ''GPT4AIGChip: Towards Next-Generation AI Accelerator Design Automation via Large Language Models''. arXiv:2309.10730 [cs.LG], 2025. [https://arxiv.org/abs/2309.10730 Available online]</ref> convert ML kernels into synthesizable HLS by combining structural abstraction and loop pattern rewrites.
* Tools like GPT4AIGChip<ref name="fu2025">Fu, Y.; et al. ''GPT4AIGChip: Towards Next-Generation AI Accelerator Design Automation via Large Language Models''. *Proceedings of the 2023 IEEE/ACM International Conference on Computer Aided Design* (ICCAD '23), San Francisco, CA, USA, pp. 1–9. IEEE, 2023. [https://doi.org/10.1109/ICCAD57390.2023.10323953 Available online]</ref> convert ML kernels into synthesizable HLS by combining structural abstraction and loop pattern rewrites.


====Constraint Generation====
====Constraint Generation====


Constraint files are essential for synthesis, placement, and timing correctness. LLMs like ChatEDA and ChipNeMo support this through:
Constraint files are essential for synthesis, placement, and timing correctness. LLMs like ChatEDA support this through:


* Instruction tuning, enabling fine-grained command generation (e.g., for SDC, XDC formats),
* Instruction tuning, enabling fine-grained command generation (e.g., for SDC, XDC formats),
Line 137: Line 137:
====Analog Circuit Synthesis====
====Analog Circuit Synthesis====


Analog design poses unique challenges due to its sensitivity and lack of digital abstraction. Tools like AnalogCoder<ref name="lai2024"/>, LLANA<ref name="amaduzzi2024"/>, and LaMAGIC<ref name="chang2024"/> use:
Analog design poses unique challenges due to its sensitivity and lack of digital abstraction. Tools like AnalogCoder<ref name="lai2024"/> and LaMAGIC<ref name="chang2024"/> use:


* Topology suggestion via LLMs, based on specification matching (gain, slew, bandwidth),
* Topology suggestion via LLMs, based on specification matching (gain, slew, bandwidth),
Line 155: Line 155:
|-
|-
! Project Name !! Model Used !! Approach Type !! Summary
! Project Name !! Model Used !! Approach Type !! Summary
|-
|ChipGPT <ref name="chipgpt"/> || ChatGPT (GPT-3.5) || In-context Learning || Zero-code RTL generation using NL prompts; utilizes prompt manager and structured chaining.
|-
|-
| RTLLM<ref name="lu2023"/> || GPT-3.5 || Prompt Engineering || Multi-step planning-based prompt design with syntax and functional log feedback.
| RTLLM<ref name="lu2023"/> || GPT-3.5 || Prompt Engineering || Multi-step planning-based prompt design with syntax and functional log feedback.
|-
| AutoChip<ref name="thakur2024">Thakur, Shailja; Blocklove, Jason; Pearce, Hammond; Tan, Benjamin; Garg, Siddharth; and Karri, Ramesh. ''AutoChip: Automating HDL Generation Using LLM Feedback''. arXiv:2311.04887 [cs.PL], 2024. [https://arxiv.org/abs/2311.04887 Available online]</ref> || GPT-3.5 turbo || Iterative Feedback || Uses compile/simulation logs to iteratively refine HDL, reducing human debugging effort.
|-
|-
|Chip-Chat<ref name="blocklove2023">Blocklove, Jason; Garg, Siddharth; Karri, Ramesh; and Pearce, Hammond. ''Chip-Chat: Challenges and Opportunities in Conversational Hardware Design''. In: ''Proceedings of the 2023 ACM/IEEE 5th Workshop on Machine Learning for CAD (MLCAD)'', IEEE, Sept. 2023, pp. 1–6. [https://doi.org/10.1109/MLCAD58807.2023.10299874 Available online]</ref> || ChatGPT-4 || Conversational Co-design || Full pipeline HDL synthesis guided via interactive dialogue with GPT-4.
|Chip-Chat<ref name="blocklove2023">Blocklove, Jason; Garg, Siddharth; Karri, Ramesh; and Pearce, Hammond. ''Chip-Chat: Challenges and Opportunities in Conversational Hardware Design''. In: ''Proceedings of the 2023 ACM/IEEE 5th Workshop on Machine Learning for CAD (MLCAD)'', IEEE, Sept. 2023, pp. 1–6. [https://doi.org/10.1109/MLCAD58807.2023.10299874 Available online]</ref> || ChatGPT-4 || Conversational Co-design || Full pipeline HDL synthesis guided via interactive dialogue with GPT-4.
Line 167: Line 163:
|-
|-
|ChatEDA<ref name="wu2024"/> || LLaMA-20B || QLoRA + Instruction Tuning || Trained on GPT-4-generated EDA instructions; interprets and executes user commands.
|ChatEDA<ref name="wu2024"/> || LLaMA-20B || QLoRA + Instruction Tuning || Trained on GPT-4-generated EDA instructions; interprets and executes user commands.
|-
| ChipNeMo<ref name="liu2024chipnemo"/> || LLaMA 7B/13B/70B || DAPT + Tokenizer Mod || Custom tokenizer, retrieval-augmented models trained on logs and scripts.
|-
|-
|RTLCoder<ref name="liu2024"/>|| Mistral-7B || Scored SFT || Uses synthesis scores to steer SFT toward functionally valid and resource-efficient HDL
|RTLCoder<ref name="liu2024"/>|| Mistral-7B || Scored SFT || Uses synthesis scores to steer SFT toward functionally valid and resource-efficient HDL
|-
|-
|BetterV<ref name="pei2024">Pei, Zehua; Zhen, Hui-Ling; Yuan, Mingxuan; Huang, Yu; and Yu, Bei. ''BetterV: Controlled Verilog Generation with Discriminative Guidance''. arXiv:2402.03375 [cs.AI], 2024. [https://arxiv.org/abs/2402.03375 Available online]</ref> || CodeLlama + TinyLlama || Controlled Gen + SFT || Bayesian discriminator modifies token probability for valid HDL output
|BetterV<ref name="pei2024">Pei, Zehua; Zhen, Hui-Ling; Yuan, Mingxuan; Huang, Yu; and Yu, Bei. ''BetterV: Controlled Verilog Generation with Discriminative Guidance''. *Proceedings of the 41st International Conference on Machine Learning* (ICML '24), Article 1628, 9 pages. JMLR.org, 2024. [https://proceedings.mlr.press/v202/pei24a.html Available online]</ref>
|| CodeLlama + TinyLlama || Controlled Gen + SFT || Bayesian discriminator modifies token probability for valid HDL output
|-
| RTLFixer<ref name="tsai2024"/> || GPT-4 || RAG + Agent Framework || Uses ReAct prompting and error categorization DBs for debug-oriented HDL refinement.
|-
|-
|VerilogCoder<ref name="ho2025"/> || GPT-4 / LLaMA3 || Multi-Agent + AST Trace || Uses waveform tracing and signal planning to backtrace functional errors in RTL.
| RTLFixer<ref name="tsai2024"/> || GPT-4 || RAG + Agent Framework || Uses ReAct prompting and error categorization DBs for debug-oriented HDL refinement.
|}
|}


These methods highlight key trends and research frontiers:
These methods highlight key trends and research frontiers:


* Prompting + Logs: AutoChip<ref name="thakur2024"/> and RTLLM<ref name="lu2023"/> are two examples of tools that show that prompting alone, when combined with feedback from toolchains, is sufficient for competitive HDL generation without model retraining.
* Prompting + Logs: RTLLM<ref name="lu2023"/> is an example of tools that show that prompting alone, when combined with feedback from toolchains, is sufficient for competitive HDL generation without model retraining.
* Fine-tuning on RTL: VeriGen<ref name="verigen2024"/> and RTLCoder<ref name="liu2024"/> show that focused fine-tuning, especially with quality metrics (e.g., synthesis logs, functional correctness), significantly improves output robustness.
* Fine-tuning on RTL: VeriGen<ref name="verigen2024"/> and RTLCoder<ref name="liu2024"/> show that focused fine-tuning, especially with quality metrics (e.g., synthesis logs, functional correctness), significantly improves output robustness.
* Controlled Generation: BetterV<ref name="pei2024"/> uses probabilistic controls in token sampling, pushing Verilog generation beyond maximum-likelihood decoding.
* Controlled Generation: BetterV<ref name="pei2024"/> uses probabilistic controls in token sampling, pushing Verilog generation beyond maximum-likelihood decoding.
* Agent Architectures: RTLFixer<ref name="tsai2024"/> and VerilogCoder<ref name="ho2025"/> embody an emerging paradigm where LLMs serve not just as code generators, but as self-refining agents—reading logs, tracing waveforms, and performing symbolic analysis.
* Agent Architectures: RTLFixer<ref name="tsai2024"/> embodies an emerging paradigm where LLMs serve not just as code generators, but as self-refining agents—reading logs, tracing waveforms, and performing symbolic analysis.


The table also highlights the significance of multi-agent collaboration, [[Retrieval-augmented generation|retrieval-augmented generation]] (RAG), and tool-in-the-loop frameworks, which move beyond simple completion tasks into autonomous reasoning and repair. The performance advantages of fine-tuned and multi-modal frameworks over traditional prompting, as shown in benchmarks like VerilogEval<ref name="liu2023verilogeval">Liu, Mingjie; Pinckney, Nathaniel; Khailany, Brucek; and Ren, Haoxing. ''VerilogEval: Evaluating Large Language Models for Verilog Code Generation''. arXiv:2309.07544 [cs.LG], 2023. [https://arxiv.org/abs/2309.07544 Available online]</ref> and PyHDL-Eval<ref name="batten2024">Batten, Christopher; Pinckney, Nathaniel; Liu, Mingjie; Ren, Haoxing; and Khailany, Brucek. ''PyHDL-Eval: An LLM Evaluation Framework for Hardware Design Using Python-Embedded DSLs''. In: ''Proceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD (MLCAD '24)'', ACM, 2024, article 10, pp. 1–17. [https://doi.org/10.1145/3670474.3685948 Available online]</ref>, confirm that tightly integrated model-tool co-evolution is needed for true engineering-grade HDL generation.
The table also highlights the significance of multi-agent collaboration, [[Retrieval-augmented generation|retrieval-augmented generation]] (RAG), and tool-in-the-loop frameworks, which move beyond simple completion tasks into autonomous reasoning and repair. The performance advantages of fine-tuned and multi-modal frameworks over traditional prompting, as shown in benchmarks like VerilogEval<ref name="liu2023verilogeval">Pinckney, Nathaniel; Batten, Christopher; Liu, Mingjie; Ren, Haoxing; and Khailany, Brucek. ''Revisiting VerilogEval: A Year of Improvements in Large-Language Models for Hardware Code Generation''. *ACM Transactions on Design Automation of Electronic Systems* (TODAES), Association for Computing Machinery, February 2025. [https://doi.org/10.1145/3718088 Available online]</ref> and PyHDL-Eval<ref name="batten2024">Batten, Christopher; Pinckney, Nathaniel; Liu, Mingjie; Ren, Haoxing; and Khailany, Brucek. ''PyHDL-Eval: An LLM Evaluation Framework for Hardware Design Using Python-Embedded DSLs''. In: ''Proceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD (MLCAD '24)'', ACM, 2024, article 10, pp. 1–17. [https://doi.org/10.1145/3670474.3685948 Available online]</ref>, confirm that tightly integrated model-tool co-evolution is needed for true engineering-grade HDL generation.


==Datasets and Evaluation Infrastructure==
==Datasets and Evaluation Infrastructure==
Line 192: Line 185:
Large language models in EDA are developed, tuned, and evaluated using robust datasets. These datasets come in a range of formats, from performance metrics and natural language requirements to tokenized Verilog corpora and annotated tool logs. They make it possible for supervised fine-tuning, domain adaptation, and benchmarking for synthesis validity and generation quality.
Large language models in EDA are developed, tuned, and evaluated using robust datasets. These datasets come in a range of formats, from performance metrics and natural language requirements to tokenized Verilog corpora and annotated tool logs. They make it possible for supervised fine-tuning, domain adaptation, and benchmarking for synthesis validity and generation quality.


In addition to increasing dataset volume, recent initiatives have improved granularity and diversity. Instruction-tuned datasets like ChatEDA<ref name="wu2024"/> teach LLMs how to interact with toolchains; benchmark sets such as VerilogEval<ref name="liu2023verilogeval"/> assess model output quality; and design-level corpora like RTLCoder<ref name="liu2024"/> and MG-Verilog<ref name="zhang2024mgverilog">Zhang, Yongan; Yu, Zhongzhi; Fu, Yonggan; Wan, Cheng; and Lin, Yingyan Celine. ''MG-Verilog: Multi-grained Dataset Towards Enhanced LLM-assisted Verilog Generation''. arXiv:2407.01910 [cs.LG], 2024. [https://arxiv.org/abs/2407.01910 Available online]</ref> offer structural annotations and synthesis metadata. Human-annotated multilingual Verilog pairs that facilitate abstraction and cross-language translation are provided by the MG-Verilog<ref name="zhang2024mgverilog"/>. The VeriGen<ref name="verigen2024"/> dataset uses textbook-derived Verilog tasks to facilitate fundamental pedagogical finetuning.
In addition to increasing dataset volume, recent initiatives have improved granularity and diversity. Instruction-tuned datasets like ChatEDA<ref name="wu2024"/> teach LLMs how to interact with toolchains; benchmark sets such as VerilogEval<ref name="liu2023verilogeval"/> assess model output quality; and design-level corpora like RTLCoder<ref name="liu2024"/> and [https://huggingface.co/datasets/GaTech-EIC/MG-Verilog MG-Verilog] offer structural annotations and synthesis metadata. Human-annotated multilingual Verilog pairs that facilitate abstraction and cross-language translation are provided by the [https://huggingface.co/datasets/GaTech-EIC/MG-Verilog MG-Verilog]. The VeriGen<ref name="verigen2024"/> dataset uses textbook-derived Verilog tasks to facilitate fundamental pedagogical finetuning.


==Tooling and Infrastructure: Practical Deployments==
==Tooling and Infrastructure: Practical Deployments==
Line 198: Line 191:
Several practical tools now demonstrate that LLM-aided design is no longer theoretical:
Several practical tools now demonstrate that LLM-aided design is no longer theoretical:


* AutoChip<ref name="thakur2024"/>: Automates the entire RTL generation process from high-level specifications using decoder models and iterative refinement.
* ChatEDA<ref name="wu2024"/> : Serves as a natural language interface for controlling [[Vivado]], [https://www.intel.com/content/www/us/en/products/details/fpga/development-tools/quartus-prime.html Quartus], or [https://www.cadence.com/en_US/home/tools/digital-design-and-signoff/soc-implementation-and-floorplanning/innovus-implementation-system.html Innovus] workflows. It interprets user intent and translates it into tool-specific commands.
* ChatEDA<ref name="wu2024"/> : Serves as a natural language interface for controlling [[Vivado]], [https://www.intel.com/content/www/us/en/products/details/fpga/development-tools/quartus-prime.html Quartus], or [https://www.cadence.com/en_US/home/tools/digital-design-and-signoff/soc-implementation-and-floorplanning/innovus-implementation-system.html Innovus] workflows. It interprets user intent and translates it into tool-specific commands.
* RTLLM<ref name="lu2023"/>-Editor: An IDE that integrates real-time HDL generation, compilation feedback, and syntax repair.
* RTLLM<ref name="lu2023"/>-Editor: An IDE that integrates real-time HDL generation, compilation feedback, and syntax repair.

Revision as of 20:20, 19 June 2025




LLM-aided design refers to the use of large language models (LLMs) as smart agents throughout the end-to-end process of system design, including conceptualization, prototyping, verification, and optimization. This evolving interdisciplinary model integrates advances in natural language processing (NLP), program synthesis, and automated reasoning to support tasks in domains such as electronic design automation (EDA), software engineering, hardware design, and cyber-physical systems.

Unlike traditional automation tools, LLMs - especially transformer-based architectures like GPT-4, Claude[1], LLaMA, and domain-specialized variants such as CodeLlama - are capable of interpreting, generating, and refining structured and unstructured data including natural language specifications, HDL (Hardware Description Language)/HDL-like code, constraint definitions, tool scripts, and design documentation. LLM-aided design thus represents a shift from tool-assisted engineering to a form of co-design in which machine intelligence participates actively in architectural exploration, logic synthesis, formal verification, and post-silicon validation. It is situated at the intersection of artificial intelligence, computer-aided design (CAD), and systems engineering.

Introduction

Engineering workflows in hardware and software development have traditionally relied on manual translation of high-level design intents into machine-readable specifications. These processes, though robust, are time-consuming and often require significant domain expertise. The introduction of large language models into design workflows aims to streamline this process by enabling natural language interaction, synthesis of domain-specific artifacts, and integration with design toolchains.

In recent years, the field of engineering design has witnessed an exponential conjunction of artificial intelligence (AI) and domain-specific modeling. LLMs - such as GPT-4, Claude[1], and LLaMA - are capable of understanding and generating code, documents, and designs from natural language descriptions. This capacity opens a new area where human designers can work together with AI systems to ensure design correctness and reduce time-to-market. The aim is to allow designers to express intent in natural language and rely on the model to output Verilog, VHDL, HLS C, or firmware code.


LLM-aided design differs from earlier forms of automated design through its ability to generalize across tasks and contexts. Unlike rule-based or template-driven systems, large language models can encode domain-specific heuristics and adapt to various inputs—including design specifications, codebases, formal properties, and documentation—without requiring extensive retraining. This flexibility supports their use in diverse design settings such as system-on-chip development, embedded systems, robotic control, and cyber-physical system modeling.

A new epistemic layer is added to the engineering process by LLM-aided design, in which models contribute towards design reasoning rather than only carrying out commands. This allows use for flow control automation, formal assertion generation, and template retrieval for HLS code repair. Additionally, it gave rise to domain-adapted LLMs known as circuit foundation models (CFMs), which are capable of reasoning and generating across the whole RTL-to-GDSII pipeline.

Background and Foundations of LLM-Aided Design

The integration of large language models (LLMs) into electronic design automation (EDA) represents a shift in how hardware systems are specified, verified, and developed. While EDA has conventionally been defined by predefined workflows, rule-based synthesis tools, and extensive manual intervention, the growth of LLMs has introduced a new design angle driven by reasoning, abstraction, and human-language interaction. This shift aligns with the broader trajectory of artificial intelligence, where general-purpose models have increasingly been specialized for domain-specific tasks, including those that traditionally needed expert engineers.

From Transformers to Circuit Reasoning

The transformer architecture introduced by Vaswani et al. (2017)[2] serves as the foundation of LLM-aided design. This architecture replaced RNNs and LSTMs[3] in natural language processing due to its ability to simulate long-range dependencies with self-attention mechanisms. It serves as the basis for the GPT series, beginning with GPT-2 all the way to GPT-4o and more, with each iteration having significantly better capabilities in zero-shot reasoning, code generation, and language understanding.

By 2020, GPT-3's ability to produce functional code - including basic HTML, Python, and even Verilog-had drawn the interest of the AI community. This inspired hardware design researchers to speculate that LLMs could be used for logic design and verification activities by taking advantage of the structural similarities between programming languages and hardware description languages (HDLs). Early experiments using GPT-3 to write Verilog or assist in debugging demonstrated potential but also had critical limitations like poor syntax, hallucinations, and incompatibility with synthesis tools.

The attempt to address these limitations led to the exploration of a new direction - the creation of domain-specific foundation models tailored to EDA. These models - referred to as circuit foundation models — are trained or fine-tuned on HDL codes, simulation traces, synthesis logs, and constraint files. By 2023, tools like RTLLM[4] began to deliver results with the vision of LLM-aided design through carefully engineered prompts, feedback loops, and domain-aligned datasets.

Timeline of LLM Trends in EDA
Year Milestone
2017 Transformer introduced by Vaswani et al. [2]
2020 GPT-3 exhibits rudimentary HDL generation capability.
2021 Prompt-based Verilog code generation appears in exploratory tools.
2022 RTLLM[4] pioneer structured, feedback-driven generation pipelines.
2023 Domain-specific finetuning (VeriGen[5], RTLCoder[6]); agent frameworks RTLFixer[7], MEIC[8] become practical.
2024 Vision-language fusion (LayoutCopilot[9]), analog LLMs (LaMAGIC[10], AnalogCoder[11]) expand to new design domains.
2025 Multi-agent architectures and graph-text fusion (DRC-Coder[12]) reshape design verification.

Decoder vs. Encoder Models in Co-Design

1. Decoder-Based Autoregressive Models: Based on architectures like GPT and CodeLlama, these models are used for generation tasks. They can translate natural language specifications into HDL, generate testbenches, and repair buggy RTL. Prompt chaining and few-shot learning are a few of many ways to make these models effective in synthesis-aligned code generation.

2. Encoder-Based Graph Reasoning Models: Inspired by models such as BERT and adapted into graph neural networks (e.g., ChipFormer[13]), these models are optimized for inference tasks over structural representations like netlists or IRs. They can estimate timing, identify bottlenecks, and do logic equivalence checks.

The design ecosystem is increasingly adapting hybrid strategies, where decoder models generate artifacts and encoder models verify or optimize them-forming a closed co-design loop. This dual architecture is similar to human design workflows, where generation and validation are heavily co-dependent.

Methodological Landscape of LLM-Aided Design

LLM-aided design covers multiple stages of the hardware-software co-design pipeline, including natural language specification, HDL synthesis, analog circuit design, formal verification, and layout generation. While foundational techniques such as prompting, supervised fine-tuning (SFT), and retrieval-augmented generation (RAG) cover much of the field, their practical application is widespread based on the nature of the task. To provide a comprehensive view, the following summary table classifies typical LLM methodologies by their corresponding EDA task domain for a few recently published domain-specific representative LLMs/Tools:

Methodology by Task Domain in LLM-Aided Design
Representative LLMs/Tools LLM Methodology Used Task Domain
RTLLM[4], VeriGen[5], RTLFixer[7] Prompt engineering, self-refinement, score-based SFT Specification to HDL
ChatEDA[14] Instruction tuning, retrieval-augmented generation Constraint Generation
AutoSVA[15], LLM4DV[16] Coverage-driven generation Testbench & Assertions
LayoutCopilot[9], ChatEDA[14] Vision-Language models, TCL script generation Floorplan/Layout Synthesis
AnalogCoder[11], LaMAGIC[10] Topology suggestion, layout constraints, Bayesian tuning Analog Circuit Synthesis

Core Methodologies

Below are a few core methodologies, with insights from recent tools and frameworks:

Specification to HDL Translation

LLMs can generate synthesizable RTL (Verilog, VHDL) directly from natural language specifications. This process is significantly enhanced using:

  • Prompt engineering and hierarchical prompting, for structured code generation,
  • Context window expansion, to provide multi-level module and signal context,
  • Self-refinement and feedback from compiler logs, allowing the LLM to repair and converge to synthesizable HDL,
  • Score-based supervised fine-tuning (SFT), as seen in tools like RTLLM[4], VeriGen[5], and RTLFixer[7], to improve alignment with design and functional correctness.

Testbench and Assertion Generation

LLMs synthesize SystemVerilog assertions, property checks, and full test environments using examples and coverage goals. Verification environments, SystemVerilog assertions (SVA), and test stimuli can be automatically synthesized using:

  • Coverage-driven generation, where LLMs aim to satisfy specific coverage goals and random seed diversity,
  • Tools such as AutoSVA[15] and LLM4DV[16] have shown higher assertion coverage and better bug exposure than traditional constrained-random verification methods.

HDL Debugging and Repair

Using templates, similarity search, and error log analysis, LLMs can auto-repair syntax and functional bugs. LLMs assist in both syntactic repair (fixing compilation errors) and semantic repair (correcting logical/functional behavior), leveraging:

  • Template libraries and error log parsing,
  • Similarity search from past fixes,
  • Retrieval-Augmented Generation (RAG) pipelines such as RTLFixer[7] and MEIC[8], which iteratively improve code until it passes lint, synthesis, or formal checks.

HLS Code Refinement

Standard C/C++ is often incompatible with HLS constraints (e.g., recursion, pointers). LLMs identify and rewrite such constructs by:

  • Detecting and rewriting non-HLS-friendly patterns using prompt-repair pipelines,
  • Generating test harnesses and compiler hints (e.g., `#pragma HLS unroll`),
  • Tools like GPT4AIGChip[17] convert ML kernels into synthesizable HLS by combining structural abstraction and loop pattern rewrites.

Constraint Generation

Constraint files are essential for synthesis, placement, and timing correctness. LLMs like ChatEDA support this through:

  • Instruction tuning, enabling fine-grained command generation (e.g., for SDC, XDC formats),
  • Retrieval-Augmented Generation (RAG), which pulls prior constraints from similar designs or databases to ensure domain-consistent generation,
  • Generating multi-domain timing, placement, and IO constraints with contextual accuracy.

Floorplan and Layout Synthesis

Physical design requires careful placement and routing. LLM-vision hybrid models such as LayoutCopilot[9] and ChatEDA[14] employ:

  • Vision-language modeling to interpret and manipulate layout imagery (DEF/GDSII),
  • TCL script generation, customized for tools like Innovus and ICC2,
  • Automatic power grid and macro placement proposals, based on learned design intents.

Analog Circuit Synthesis

Analog design poses unique challenges due to its sensitivity and lack of digital abstraction. Tools like AnalogCoder[11] and LaMAGIC[10] use:

  • Topology suggestion via LLMs, based on specification matching (gain, slew, bandwidth),
  • Layout constraint prediction, such as symmetry, matching, and parasitic awareness,
  • Bayesian optimization and tuning, informed by LLM predictions for transistor sizing and performance trade-offs.

These methodologies collectively depict LLMs as design agents capable of integrating with CAD flows, reasoning over heterogeneous inputs (text, code, specs, layout), and adapting to domain-specific constraints. As tools mature, the distinction between synthesis, verification, and optimization continues to blur—paving the way for closed-loop, autonomous hardware design.

Among these, HDL generation has emerged as one of the most deeply investigated tasks in LLM-aided EDA research, serving as a methodological testbed for broader design automation challenges. It captures the full interplay between natural language, symbolic code, feedback refinement, and tool integration. The following case study synthesizes key techniques employed in HDL generation workflows.

Methodological Classification of HDL Generation: A Case Study

The following table, constructed using detailed insights from recent papers, including the 2025 survey by Pan et al., highlights the methodologies underlying LLM-aided HDL generation

HDL Generation Methodologies Using LLMs
Project Name Model Used Approach Type Summary
RTLLM[4] GPT-3.5 Prompt Engineering Multi-step planning-based prompt design with syntax and functional log feedback.
Chip-Chat[18] ChatGPT-4 Conversational Co-design Full pipeline HDL synthesis guided via interactive dialogue with GPT-4.
VeriGen[5] CodeGen-16B Fine-tuning Trained on textbook + GitHub Verilog, improved synthesis-valid output, syntax robustness.
ChatEDA[14] LLaMA-20B QLoRA + Instruction Tuning Trained on GPT-4-generated EDA instructions; interprets and executes user commands.
RTLCoder[6] Mistral-7B Scored SFT Uses synthesis scores to steer SFT toward functionally valid and resource-efficient HDL
BetterV[19] CodeLlama + TinyLlama Controlled Gen + SFT Bayesian discriminator modifies token probability for valid HDL output
RTLFixer[7] GPT-4 RAG + Agent Framework Uses ReAct prompting and error categorization DBs for debug-oriented HDL refinement.

These methods highlight key trends and research frontiers:

  • Prompting + Logs: RTLLM[4] is an example of tools that show that prompting alone, when combined with feedback from toolchains, is sufficient for competitive HDL generation without model retraining.
  • Fine-tuning on RTL: VeriGen[5] and RTLCoder[6] show that focused fine-tuning, especially with quality metrics (e.g., synthesis logs, functional correctness), significantly improves output robustness.
  • Controlled Generation: BetterV[19] uses probabilistic controls in token sampling, pushing Verilog generation beyond maximum-likelihood decoding.
  • Agent Architectures: RTLFixer[7] embodies an emerging paradigm where LLMs serve not just as code generators, but as self-refining agents—reading logs, tracing waveforms, and performing symbolic analysis.

The table also highlights the significance of multi-agent collaboration, retrieval-augmented generation (RAG), and tool-in-the-loop frameworks, which move beyond simple completion tasks into autonomous reasoning and repair. The performance advantages of fine-tuned and multi-modal frameworks over traditional prompting, as shown in benchmarks like VerilogEval[20] and PyHDL-Eval[21], confirm that tightly integrated model-tool co-evolution is needed for true engineering-grade HDL generation.

Datasets and Evaluation Infrastructure

Large language models in EDA are developed, tuned, and evaluated using robust datasets. These datasets come in a range of formats, from performance metrics and natural language requirements to tokenized Verilog corpora and annotated tool logs. They make it possible for supervised fine-tuning, domain adaptation, and benchmarking for synthesis validity and generation quality.

In addition to increasing dataset volume, recent initiatives have improved granularity and diversity. Instruction-tuned datasets like ChatEDA[14] teach LLMs how to interact with toolchains; benchmark sets such as VerilogEval[20] assess model output quality; and design-level corpora like RTLCoder[6] and MG-Verilog offer structural annotations and synthesis metadata. Human-annotated multilingual Verilog pairs that facilitate abstraction and cross-language translation are provided by the MG-Verilog. The VeriGen[5] dataset uses textbook-derived Verilog tasks to facilitate fundamental pedagogical finetuning.

Tooling and Infrastructure: Practical Deployments

Several practical tools now demonstrate that LLM-aided design is no longer theoretical:

  • ChatEDA[14] : Serves as a natural language interface for controlling Vivado, Quartus, or Innovus workflows. It interprets user intent and translates it into tool-specific commands.
  • RTLLM[4]-Editor: An IDE that integrates real-time HDL generation, compilation feedback, and syntax repair.
  • LLM4DV[16] and AutoSVA[15]: Specialized for formal verification, these tools generate SystemVerilog assertions and support coverage-driven testbench synthesis.

These tools reflect an operational maturity and are being integrated into prototyping, verification, closure, and constraint generation workflows.

See Also


References

  1. ^ a b Anthropic et al. The Claude 3 Model Family: Opus, Sonnet, Haiku. Anthropic Model Card (PDF), 2024. Available online
  2. ^ a b Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N.; Kaiser, Lukasz; and Polosukhin, Illia. Attention Is All You Need. *Proceedings of the 31st International Conference on Neural Information Processing Systems* (NIPS'17), 6000–6010. Curran Associates Inc. ISBN 9781510860964. Available online
  3. ^ Sherstinsky, Alex. Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network. Physica D: Nonlinear Phenomena, vol. 404, 2020, p. 132306. Available online
  4. ^ a b c d e f g Lu, Yao; Liu, Shang; Zhang, Qijun; and Xie, Zhiyao. RTLLM: An Open-Source Benchmark for Design RTL Generation with Large Language Model. *Proceedings of the 29th Asia and South Pacific Design Automation Conference* (ASPDAC '24), 722–727. IEEE Press, 2024. Available online
  5. ^ a b c d e f Thakur, Shailja; Ahmad, Baleegh; Pearce, Hammond; Tan, Benjamin; Dolan-Gavitt, Brendan; Karri, Ramesh; and Garg, Siddharth. VeriGen: A Large Language Model for Verilog Code Generation. ACM Transactions on Design Automation of Electronic Systems, vol. 29, no. 3, article 46, 2024, pp. 1–31. Available online
  6. ^ a b c d Liu, Shang; et al. RTLCoder: Fully Open-Source and Efficient LLM-Assisted RTL Code Generation Technique. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 44, no. 4, pp. 1448–1461, April 2025. IEEE. Available online
  7. ^ a b c d e f Tsai, Yunda; Liu, Mingjie; and Ren, Haoxing. RTLFixer: Automatically Fixing RTL Syntax Errors with Large Language Model. *Proceedings of the 61st ACM/IEEE Design Automation Conference* (DAC '24), Article 53, 6 pages. Association for Computing Machinery, 2024. Available online
  8. ^ a b Xu, Ke; Sun, Jialin; Hu, Yuchen; Fang, Xinwei; Shan, Weiwei; Wang, Xi; and Jiang, Zhe. MEIC: Re-thinking RTL Debug Automation using LLMs. *Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design* (ICCAD'25), Article 100, 9 pages. Association for Computing Machinery, 2025. Available online
  9. ^ a b c Liu, B.; et al. LayoutCopilot: An LLM-Powered Multi-Agent Collaborative Framework for Interactive Analog Layout Design. *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, 2025. IEEE. Available online
  10. ^ a b c Chang, Chen-Chia; Shen, Yikang; Fan, Shaoze; Li, Jing; Zhang, Shun; Cao, Ningyuan; Chen, Yiran; and Zhang, Xin. LaMAGIC: Language-Model-Based Topology Generation for Analog Integrated Circuits. *Proceedings of the 41st International Conference on Machine Learning* (ICML '24), Article 241, 10 pages. JMLR.org, 2024. Available online
  11. ^ a b c Lai, Yao; Lee, Sungyoung; Chen, Guojin; Poddar, Souradip; Hu, Mengkang; Pan, David Z.; and Luo, Ping. AnalogCoder: Analog Circuit Design via Training-Free Code Generation. *Proceedings of the AAAI Conference on Artificial Intelligence*, vol. 39, no. 1, pp. 379–387, 2025. Available online
  12. ^ Chang, Chen-Chia; Ho, Chia-Tung; Li, Yaguang; Chen, Yiran; and Ren, Haoxing. DRC-Coder: Automated DRC Checker Code Generation Using LLM Autonomous Agent. In: Proceedings of the 2025 International Symposium on Physical Design (ISPD ’25), ACM, 2025, pp. 143–151. Available online
  13. ^ Lai, Yao; Liu, Jinxin; Tang, Zhentao; Wang, Bin; Hao, Jianye; and Luo, Ping. ChiPFormer: Transferable Chip Placement via Offline Decision Transformer. *Proceedings of the 40th International Conference on Machine Learning* (ICML '23), Article 757, 19 pages. JMLR.org, 2023. Available online
  14. ^ a b c d e f Wu, Haoyuan; He, Zhuolun; Zhang, Xinyun; Yao, Xufeng; Zheng, Su; Zheng, Haisheng; and Yu, Bei. ChatEDA: A Large Language Model Powered Autonomous Agent for EDA. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 43, no. 10, 2024, pp. 3184–3197. Available online
  15. ^ a b c Orenes-Vera, Marcelo; Manocha, Aninda; Wentzlaff, David; and Martonosi, Margaret. AutoSVA: Democratizing Formal Verification of RTL Module Interactions. *Proceedings of the 58th Annual ACM/IEEE Design Automation Conference* (DAC '21), pp. 535–540. IEEE Press, 2022. Available online
  16. ^ a b c Zhang, Zixi; Chadwick, Greg; McNally, Hugo; Zhao, Yiren; and Mullins, Robert. LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation. *Proceedings of the 33rd IEEE International Symposium on Field-Programmable Custom Computing Machines* (FCCM '25), pp. 1–5, 2025. Available online
  17. ^ Fu, Y.; et al. GPT4AIGChip: Towards Next-Generation AI Accelerator Design Automation via Large Language Models. *Proceedings of the 2023 IEEE/ACM International Conference on Computer Aided Design* (ICCAD '23), San Francisco, CA, USA, pp. 1–9. IEEE, 2023. Available online
  18. ^ Blocklove, Jason; Garg, Siddharth; Karri, Ramesh; and Pearce, Hammond. Chip-Chat: Challenges and Opportunities in Conversational Hardware Design. In: Proceedings of the 2023 ACM/IEEE 5th Workshop on Machine Learning for CAD (MLCAD), IEEE, Sept. 2023, pp. 1–6. Available online
  19. ^ a b Pei, Zehua; Zhen, Hui-Ling; Yuan, Mingxuan; Huang, Yu; and Yu, Bei. BetterV: Controlled Verilog Generation with Discriminative Guidance. *Proceedings of the 41st International Conference on Machine Learning* (ICML '24), Article 1628, 9 pages. JMLR.org, 2024. Available online
  20. ^ a b Pinckney, Nathaniel; Batten, Christopher; Liu, Mingjie; Ren, Haoxing; and Khailany, Brucek. Revisiting VerilogEval: A Year of Improvements in Large-Language Models for Hardware Code Generation. *ACM Transactions on Design Automation of Electronic Systems* (TODAES), Association for Computing Machinery, February 2025. Available online
  21. ^ Batten, Christopher; Pinckney, Nathaniel; Liu, Mingjie; Ren, Haoxing; and Khailany, Brucek. PyHDL-Eval: An LLM Evaluation Framework for Hardware Design Using Python-Embedded DSLs. In: Proceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD (MLCAD '24), ACM, 2024, article 10, pp. 1–17. Available online