0% found this document useful (0 votes)
13 views14 pages

Machine Learning in VLSI Design: A Comprehensive Review

This document provides a comprehensive review of the applications of Machine Learning (ML) in VLSI design and Electronic Design Automation (EDA). It discusses the challenges posed by increasing chip complexity and the potential of ML to enhance accuracy and reduce runtime in VLSI design processes. The paper covers various ML methodologies, including supervised, unsupervised, and reinforcement learning, and highlights the importance of data processing and feature engineering in developing effective ML models.

Uploaded by

amb864789
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views14 pages

Machine Learning in VLSI Design: A Comprehensive Review

This document provides a comprehensive review of the applications of Machine Learning (ML) in VLSI design and Electronic Design Automation (EDA). It discusses the challenges posed by increasing chip complexity and the potential of ML to enhance accuracy and reduce runtime in VLSI design processes. The paper covers various ML methodologies, including supervised, unsupervised, and reinforcement learning, and highlights the importance of data processing and feature engineering in developing effective ML models.

Uploaded by

amb864789
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Journal of Integrated Circuits and Systems, vol. 19, n.

2, 2024 1

Machine Learning in VLSI Design: A Comprehensive Review


Yassine Attaoui1 , Mohamed Chentouf1 , Zine El Abidine Alaoui Ismaili2 , and Aimad El Mourabit3
1
Yassine Attaoui and Mohamed Chentouf, Siemens EDA/CSD Calypto - Synthesis Solutions, Rabat, Morocco
2
Zine El Abidine Alaoui Ismaili, Information, Communication and Embedded Systems, Mohammed V University, Rabat, Morocco
3
Aimad El Mourabit, System & Data Engineering Team. ENSA of Tangier. Abdelmalek Essaâdi University, Tangier, Morocco
e-mail: yassine.attaoui@siemens.com, mohamed.chentouf@siemens.com, z.alaoui@um5s.net.ma, aimad@lenac.univ-lyon1.fr

Abstract— state-of-the-art ML applications and aspects of to explore new methodologies, particularly in handling sig-
AI in VLSI/CAD EDA. As VLSI chip complexity increases, nificant amounts of data. ML emerges as a solution to sur-
driven by shrinking chip sizes and higher performance de- pass these predictability limitations. Through training on
mands, manufacturing high-performance chips poses signifi- massive datasets, ML-based models may offer high-accuracy
cant challenges. Recent studies have investigated the use of Ar- predictions for Quality of Results (QoR) as in Fig 2, which
tificial Intelligence and Machine Learning in VLSI CAD/EDA depicts the potential impact of ML predictions on modifying
(Computer-Aided Design/Electronic Design Automation). Ma- the trade-off curve between cost and accuracy [2].
chine Learning is experiencing a rapid growth in VLSI design
and is being increasingly integrated into EDA tool development
due to its capacity for achieving higher accuracy in reduced
runtime. This paper offers a comprehensive review of the lat-
est state-of-the-art ML applications in VLSI/CAD EDA.

Index Terms— Machine Learning, VLSI, EDA, CAD

I. I NTRODUCTION
As ASIC development grows in complexity, manufactur-
ing constraints emerge, influencing both cost and time-to- Fig. 2 ML Accuracy improvement versus Cost/Runtime
market. The escalating number of physical features in VLSI
circuits presents a serious challenge. This growth results in This paper provides a comprehensive exploration of ML in
a massive flux of data processed by EDA tools’ engines[1]. EDA and VLSI design. Our focus involved reviewing sev-
Fig 1 illustrates the data growth across technology advance- eral state-of-the-art ML applications and outlining the key
ments, complicating the task of gathering and identifying un- challenges and limitations that researchers encounter. The
derlying data correlations and patterns. Consequently, this scope of this review spans from high-level synthesis to post-
surge in data directly impacts costs and extends the develop- layout verification of ASIC flow. We close the review by
ment runtime. identifying potential promises and future directives. The re-
mainder of this paper is structured as follows. Section 2 pro-
vides an overview of the fundamentals of ML and Neural
network. Section 3 offers a literature review, this section is
divided into multiple subsections based on different parts of
the ASIC flow. Section 4, presents a results discussion and
offers insights into future perspectives.

II. BACKGROUND AND ML F UNDAMENTALS

A. AI/ML Role
Fig. 1 EDA Data capacity vs. chip technology node AI frameworks consist of data-driven models able to pro-
cess, perceive, and interpret the environment, engage in rea-
Recently, Machine Learning (ML) became the focal point soning, and make informed decisions. The AI comprises
and is extensively applied in VLSI design. It has empow- several sub-fields, among which are Machine Learning and
ered designers with the capability to discover patterns and Deep Learning. ML focuses on creating models that improve
functions within complex unexplored data, and provides the their performance by learning from data using predefined al-
opportunity to gain more knowledge on circuit behavior. As gorithms, aiming to discern patterns and correlations within
we advance toward physical layout implementation, more datasets. The model acquires knowledge through training
detailed and accurate physical information becomes acces- and can subsequently generate predictions for new unseen
sible. However, there is a point where metrics settle, and data. ML techniques are classified into three primary sub-
EDA heuristics face limitations due to excessive pessimism. sets: Supervised, Unsupervised, and Reinforcement Learn-
This presents a challenge for EDA companies, urging them ing.

Digital Object Identifier 10.29292/jics.v19i2.826


2 ATTAOUI et al.: Machine Learning in VLSI Design: A Comprehensive Review.
B. Dataset and Feature Engineering C. Supervised Learning: Labeled Data
Each ML models build knowledge by learning from a Supervised learning algorithms learn from labeled
training dataset. Datasets typically include input and out- datasets, focusing on adjusting the model’s parameters and
put variables, often referred to as features (independent vari- creating an inferred function that maps inputs to outputs with
ables) and targets (dependent variables). Consequently, each a minimized prediction error. The supervised models learn
line in the dataset represents one sample along with the cor- from a pair of input vectors and a corresponding target value.
responding target value. We often represent independent Two primary types of supervised learning exist, Classifica-
variables as matrix X with m rows of data, each with n fea- tion and Regression. Classification algorithms allocate the
tures, while Y is a vector containing m target dependent vari- input vector to a predefined category or class. The classifi-
ables. [3] cation is either binary classification (two target categories)
or multi-class classification (multiple categories). While Re-
    gression algorithms focus on predicting continuous numeric
X11 X12 ... X1n Y1
 X21 values.
X22 ... X2n   Y2 
X= . Various regression algorithms exist, each for a distinct
..Y 
= . 
   
.. ..
 .. . . .   ..  needs. Linear Regression (LR) presumes a linear associa-
Xm1 Xm2 ... Xmn Ym tion between features and the target. The Polynomial Regres-
sion (PR) captures non-linear relationships through polyno-
Real-world data requires preprocessing before training. In mial functions. The Decision Trees (DT) recursively split the
the following paragraph, we detail various methods used for dataset into subsets based on the most significant attributes,
data reprocessing. [4] [5] [6] thus creating a tree structure that leads to average prediction.
The Random Forest (RF) is an ensemble method that com-
• Data cleaning addresses missing and duplicate values bines multiple decision trees to improve prediction accuracy.
using the imputation technique which refers to replac- The Extra-Trees or Extremely Randomized Trees is another
ing missing values with estimations. ensemble method that constructs decision trees with random-
• Handling Outliers aims to filter noise and irregular ized feature splits. The Support Vector Regressor (SVR)
data points that may significantly deviate from the aims to find a hyperplane that minimizes the prediction error
rest of data using methods such as Z-Score and In- while allowing a tolerance margin. The k-Nearest Neighbors
terquartile Range (IQR). (KNN) is a non-parametric algorithm that predicts the tar-
get value by averaging the values of its k-nearest neighbors.
• Data Scaling brings features to a common scale and
While Naive Bayes Regressor (NBR) relies on probabilistic
makes sure that all features have the same range,
principles. The Gradient Boosting (GB) constructs a model
preventing large values from disproportionating and
by combining multiple weak decision tree models and grad-
dominating the model. The common scaling tech-
ually reducing the prediction error by fitting each tree to the
niques are Min-Max, normal scaling, Z-score Stan-
residual errors of the previous trees. The list is still exten-
dardization, and Robust Scaling.
sive, other methods and neural network algorithms exist that
• Feature Engineering aims to create new features from have not been included. [7] [8] [9] [3] [10] [6]
the existing features to form an informative dataset. A model exhibits good generalization capabilities when
• Handling categorical data involves employing tech- it provides accurate predictions for unseen data. If the in-
niques such as one-hot encoding, which transform ferred model is too simplistic and predicts inaccurate values
categorical variables into binary vectors. for the training set, it risks Underfitting the training data. On
• Feature selection involves choosing the most relevant the other hand, when the training data is insufficient, we risk
features to train the model and improve model per- having an Overfitting where the model produces good pre-
formance, as not all features have equal weight and dictions on the training set but fails when facing new data,
contribution. the model then has a low generalization capability. As a re-
sult, it’s crucial to strike a balance for model complexity and
• Dataset splitting balances between training, testing, find a well-balanced spot between underfitting and overfit-
and evaluation sets. The training set is the portion ting as depicted in Fig 3.
that enables the model to learn the patterns and fea-
ture’s relationships. The model adjusts its internal pa-
rameters to minimize prediction error. The validation
set is used to fine-tune the model’s hyperparameters, a
common method is k-fold cross-validation, where the
dataset is divided into k subsets/folds. The model is
trained and validated k times, with each fold serving
as a validation, while the remaining k-1 folds form
the training set. In case the performance degrades
on the validation set, it indicates an overfitting. The
testing set evaluates the model’s performance on un-
seen data, offering an estimate of the model’s accu- Fig. 3 Underfitting and Overfitting
racy when applied to real-world new data.
Journal of Integrated Circuits and Systems, vol. 19, n. 2, 2024 3
D. Unsupervised Learning: Discovering Patterns in
Data
In unsupervised learning, we provide input data without
the guidance of labeled outputs. The purpose is to extract
the underlying knowledge and common structure within the
given inputs without target values. One of the most common
unsupervised learning techniques is clustering, a method
that groups input values into clusters based on their similari-
ties and resemblances. Another known technique is Trans-
formation or Dimensionality Reduction, which transforms
the high-dimensional data to low dimensions while preserv-
ing essential features.
Unsupervised learning is frequently applicable in the vi- Fig. 4 Perceptron
sualization of representative data. Among the Clustering al-
gorithms, the K-means divides data into k-clusters by min-
Non-linearity is introduced through an activation function,
imizing the sum of squared distances between data points
unless the network would operate as a linear function. As
and cluster centroids. Meanwhile, DBSCAN (Density-Based
a result, when numerous neurons with non-linear activation
Spatial Clustering of Applications with Noise) identifies
functions are stacked in a large network, it becomes capa-
clusters grounded in their density. The Principal Component
ble of approximating highly complex functions. Some com-
Analysis (PCA) is also a dimensionality reduction technique,
monly used activation functions are presented below with
which discovers a set of orthogonal axes, referred to as prin-
their respective equations (eq. 2). Fig 5 illustrates the graph-
cipal components that capture the data variance. [7] [8] [9]
ical representations of below activation functions. [12]
[3] [10] [6]

E. Reinforcement: Learning from Experience • The Step function produces binary outputs deter-
mined by a predefined threshold. It operates as a bi-
Reinforcement Learning (RL) stands apart from super-
nary function, yielding an output of ’1’ when the in-
vised and unsupervised learning. It is an interactive learn-
put exceeds ’0’ and ’-1’ otherwise.
ing paradigm where an agent interacts with the environment
• The Sigmoid function maps the weighted sum to the
and makes decisions to achieve specific predictions. The
range ’0;1’. It’s often used in binary classification.
agent receives feedback in the form of rewards or punish-
The sigmoid smoothly transforms input values into a
ments based on its prediction choices, and it learns to op-
probability-like output. It maps large negative inputs
timize its strategy over time to maximize cumulative re-
to ’0’ and large positive inputs to ’1’.
wards. Several prominent Reinforcement algorithms in-
clude Q-Learning, Policy Gradient Methods, Markov De- • The Hyperbolic Tangent (tanh) is similar to the sig-
cision Processes (MDP), and Monte Carlo Tree Search moid. However, the output ranges from ’-1’ to ’1’.
(MCTS).[11][10] It maps the weighted sum to -1;1, offering a centered
output.
F. Deep Learning and Neural Networks • The Rectified Linear Unit (ReLU) activation function
Neural Networks (NN) can be adapted for supervised, un- outputs the weighted sum if the result is positive and
supervised, and reinforcement learning. NNs have shown re- ’0’ otherwise.
markable performance in handling complex data structures.
In this section, we introduce a single neuron basic building (
block structure and the architecture of a vanilla feed-forward 1 if x > 0
neural network. We provide an overview of the concept of Step: f (x) =
−1 if x ≤ 0
Backpropagation and review commonly used Deep Neural
Networks (DNNs). 1
Sigmoid: f (x) = (2)
1 + e−x
F..1 PerceptronA neuron or perceptron, is the funda- x −x
e −e
mental component in neural networks that processes input tanh: f (x) = tanh(x) =
and makes decisions. It assigns weights to its multi-input ex + e−x
values based on its importance, then linearly combines the ReLU: f (x) = max(0, x)
weighted inputs and introduces a bias term. The result un-
dergoes a non-linearity through an activation function. In Equation eq.3 represents the output Y after being passed
Fig 4, multiple inputs represent features, denoted as x1, x2,..., through the activation function, where Yin is the result from
xn. Each input has an associated weight or importance to the eq.1 (Fig 4). The Sigmoid function then constrains the result
output. (eq.1)[12] between ’0’ and ’1’.

n 1
X Y = (3)
Y in = (xi · wi ) + Bo (1) 1 + e−(Y in)
i=1
4 ATTAOUI et al.: Machine Learning in VLSI Design: A Comprehensive Review.
gradient descent of the cost. This process is the Backprop-
agation and allows the adjustment of weights and biases for
all neurons to minimize the final cost. Some common loss
functions are Mean Squared Error (MSE) for regression or
Cross-entropy for classification.
F..4 BackpropagationThe Backpropagation is funda-
mental for weights and bias updates. The network calcu-
lates the cost gradient for each weight and bias in the output
layer and backpropagates to adjust all weights and biases.
The gradient reflects how small or large the weight and bias
Fig. 5 Common activation functions changes affect the cost function, and points the direction for
weight adjustments to minimize the error. The weight up-
date is the subtraction of the learning rate multiplied by the
F..2 MultilayerA Multilayer Perceptron (MLP) is the gradient eq.5. The learning rate controls the step size of the
fundamental Artificial Neural Network (ANN). It comprises weight updates and influences convergence speed and train-
multiple layers of interconnected neurons. In Fig 6, the net- ing stability.[12]
work is a Feedforward neural network (FNN) of three hid-
den layers. The network adjusts weights during training. and
∂E
each neuron in the hidden layer receives the results of all pre- ∆wij = −η (5)
vious layer neurons, processes the linear weighted combina- ∂wij
tion, and passes the results through the activation function.
[12] F..5 Deep LearningA Deep Neural Network (DNN) is
a feed-forward network with multiple hidden layers trained
using backpropagation. Convolutional Neural Networks
(CNNs) are tailored for grid-like data, such as image clas-
sification and object detection, to detect local patterns. Re-
current Neural Networks (RNNs) are specialized for sequen-
tial data, where the element order is significant. They utilize
recurrent connections to manage and update hidden states
and are valuable in Natural Language Processing (NLP)
and speech recognition. The Long Short-Term Memory Net-
Fig. 6 Feedforwad Neural Network
works (LSTMs) are a specialized type of RNN designed
to overcome the vanishing gradient problem and capture
Eventually, the Output layer represents the predicted tar- long-range dependencies in sequential data. Gated Recur-
gets variables, either for regression tasks or classes in classi- rent Unit Networks (GRUs) are also RNNs, with a sim-
fication tasks. The output layer may comprise multiple neu- pler architecture, offering efficiency for tasks like natural
rons, each corresponding to a specific class. The matrix in language understanding and speech synthesis. Lastly, Au-
eq.4 represents the output values (Y ) for m hidden neuron. toencoders are networks used for unsupervised learning and
dimensionality reduction, mapping input data to a lower-
dimensional representation and a decoder network recon-
Y = σ (X · W + b) (4) structing the input.[12]
        Even though deep learning has been around for decades,
Y1 x1 w11 w12 . . . w1m b1 the hardware support for neural networks has only recently
 Y2   x2   w21 w22 . . . w2m   b2  come to realization. The development of accelerated parallel
 ..  = σ( ..  .  .. ..  +  .. ) computing CPU-GPU architectures has made deep learning
       
.. ..
 .   .   . . . .   . 
Ym xn wn1 wn2 . . . wnm bm achievable. As a result, model training times have been re-
duced from weeks to hours.
• Y: output values in the 1st hidden layer, for m neuron. III. L ITERATURE R EVIEW: ML IN VLSI D ESIGN
• X: n input features. AND CAD EDA
• W: weight, Wij with i input and j neuron. The field of Computer-Aided Design (CAD) is rapidly
• b: biases vector for m neuron.. evolving to address the increasing complexities of mod-
• σ is the activation function applied element-wise. ern VLSI chips [1]. AI integration into design au-
tomation tools represents an approach to stay at the
F..3 Cost functionDuring the forward propagation, each forefront of technological advancements. Extensive re-
neuron computes its weighted sum and applies non-linearity. search has been conducted focusing on reducing design
The Cost function or Loss function refers to the error on the runtime and improving QoR through AI/ML integration
output layer obtained by comparing the predicted versus tar- [13][14][15][16][14][17][16][18]. A similar review has been
get values. The cost function estimates the error for the en- provided in [13], offering an extensive examination of the
tire network, then propagate it backward through through the AI/ML methodologies suggested in existing literature. This
Journal of Integrated Circuits and Systems, vol. 19, n. 2, 2024 5
review primarily encompasses all stages of VLSI abstraction, implemented a Simulated Annealing (SA) probabilistic al-
including architectural considerations, physical design, cir- gorithm. The author introduces a faster SA method based on
cuit simulation, manufacturing, and VLSI testing. In this decision trees (DT), resulting in a similar performance to a
section, we provide an overview of the state-of-the-art and standard SA but with a gain up to 43% on average runtime
survey recent research and draw some key limitations and improvement. Lastly, [24] delves into the challenges posed
highlight possible enhancements. by DSE on systems with multicore processors. The author
leverages reinforcement techniques such as Imitation Learn-
A. ML in VLSI functional Verification ing (IL) to enhance the computational efficiency of these
The study made in [64] and [43] demonstrates the impor- manycore systems.
tance of data mining in EDA. In [19], data mining and pat-
tern extraction have exhibited the potential of the Support C. ML in Physical Design
Vector Machine (SVM) in reducing runtime and enhancing
simulation coverage during functional verification. As illus- ML finds application also in the physical design stages,
trated in Fig 7, a standard set of unit tests typically attains where data continues to grow alongside technological
maximum functional coverage upon 6,000 tests. By employ- progress. In this section, we explore cutting-edge applica-
ing the SVM-based model, a subset of merely 310 tests ac- tions within backend design.
complished equivalent coverage, leading to a notable 95% C..1 ML for Floorplanning and Placement Optimization
reduction in simulation runtime. The model is capable of Traditional Place-and-Rout (PnR) tools typically generate a
predicting coverage overlap and captures test similarities, as floorplan without exploring multiple alternatives, regardless
depicted in Fig 8. In [20], the author proposed a classifica- of timing, wire length, congestion, power, routability, and
tion learning method to enhance functional coverage based other QoR metrics. In [25], the author introduces a deep-
on assertions. The model identifies how frequently an asser- learning neural network that explores various floorplan al-
tion is triggered, highlighting the significance of each unit ternatives, considering different aspect ratios and placement
test. The objective is to extract knowledge that can activate styles. The model automatically generates an optimal floor-
assertions with lower coverage. To achieve this, the authors plan for subsequent PnR stages based on dataflow and DSE
employed a feature-based analysis, utilizing supervised clas- information. In [26], the author presents a reinforcement
sification and unsupervised association rules. learning agent that undergoes training across multiple chip
blocks to produce optimized chip placements. This approach
involves the sequential macro and standard cell placement on
the chip canvas. The model’s rewards are based on the cost
associated with wirelength and routing congestion.
In [27] and [28], two deep learning-based models are in-
troduced to enhance design testability (DFT). The model
uses a Graph Convolutional Neural Network (CNN) for con-
trol and observation (CP-OP) point insertion. The graph
CNN aims to minimize the number of CP-OPs while max-
imizing fault coverage. In [29] employs a DNN framework
Fig. 7 Runtime saving using SVM-based mode to accelerate cell placement. The results significantly im-
proved the QoR and routing congestion. While the model
still achieves good placement quality comparable to state-of-
the-art placers. The model has accelerated the global place-
ment runtime by a factor of 30.
Typically, an effective placement aims to minimize the
half-perimeter wirelength (HPWL). Nevertheless, handling
datapaths can yield varying QoR, thus offering different
placements. In both [30] and [31], the authors proposed
a model that combines SVM and ANN. The models clas-
Fig. 8 Increasing Coverage using SVM-based mode
sify datapaths by their order of importance and guide a wire
length-driven placement strategy, focusing on the highly
B. ML at High Level Synthesis weighted datapaths. The results led to a reduction of 7%
in HPWL and 12% in Steiner Wire Length (StWL).
ML application on High-Level Synthesis (HLS) has raised
the challenge of design space exploration (DSE). In [21], the C..2 ML for Clock Network OptimizationIn synchronous
author presents a learning-based model for DSE that speeds circuits, the primary challenges concern the clock network
up the convergence toward the optimal RTL design archi- as it is one of the most critical networks. Achieving a zero-
tecture. The results from Random Forest and randomized skew clock network has always been a challenge. A com-
selection algorithms yielded the highest accuracy for opti- mon strategy to optimize clock skew, minimize clock-tree
mal Pareto for RTL architecture. Similarly, [22] harnesses length, and mitigate clock network power consumption in-
the power of RF and Extra-Tree to guide DSE in discovering volves placing latches in proximity to local clock buffers, a
Pareto-optimal combinations of area and performance. [23] technique discussed in [32] and [30]. The authors introduce
6 ATTAOUI et al.: Machine Learning in VLSI Design: A Comprehensive Review.

Table I. State-of-the-art works of Machine Learning application in VLSI Design.


Stage ML algorithm
Year Tech.
Verif HLS Floor. CTS Route Phy. Power Supervised Unsupervised Reinforcement
DSE Place Verif Grid
[19] 2012 28nm ✓ SVM
[20] 2013 28nm ✓ Classification Association
[21] 2013 45nm ✓ RF Random Selection
[22] 2008 - ✓ RF Extra-Tree
[23] 2014 45nm ✓ Decision Tree
[24] 2018 - ✓ Imitation
Learning
[25] 2020 - ✓ -
[26] 2020 - ✓ MDP
[27] 2020 - ✓ ✓ Graph CNN
[28] 2019
[29] 2019
[30] 2012 - ✓ SVM ANN
[31] 2015
[32] 2013 22nm ✓ ✓ Decision Tree
[33] 2019 - ✓ ✓ CNN
[34] 2016 28nm ✓ ✓ SVM MARS
45nm
65nm
[35] 2016 28nm ✓ ✓ ✓ GPR
[36] 2017 14nm ✓ ✓ ✓ LR LogR SVM classifier
[37] 2018 - ✓ ✓ ANN
[38] 2014 45nm ✓ ✓ MARS
[39] 2015
[40] 2019 - ✓ ✓ ✓ CNN
[41] 2015 28nm ✓ ✓ ANN SVM
[42] 2019 - ✓ ✓ Autoencoders
[43] 2016 45nm ✓ SVM Classifier
[44] 2011
[45] 2012 - ✓ CNN
[46] 2018 - ✓ XGB CNN
[47] 2012 90nm ✓ LR
[48] 2018 16nm ✓ ANN
45nm
[49] 2017 180nm ✓ K-means
Mean-Shift
DBSCAN
[50] 2014 45nm ✓ SVM
[51] 2020 - ✓ ANN
[52] 2021 - ✓ ANN LogR
[53] 2020 15nm ✓ ✓ PR SVR ANN
[54] 2007 90nm ✓ ✓ SVM classifier
[55] 2008 90nm ✓ ✓ ϵ-SVR
[56] 2014 28nm ✓ ANN SVR RF
45nm
[57] 2019 45nm ✓ ANN RF
[58] 2013 - ✓ Regressor classifier
[59] 2018 28nm ✓ ✓ RF
[60] 2016 28nm ✓ ✓ Lasso SVM GB ANN
[61] 2021 7nm ✓ LR RF DT
[62] 2022 130nm ✓ GNN
[63] 2020 7nm ✓ LR ANN RF XGBoost

LR:Linear Regression, LogR:Logistic Regression, PR:Polynomila Regression, DT:Decision Tree, RF:Random Forest, GB:Gradient
Boosting, XGBoost:eXtreme Gradient Boosting, SVM:SupportVectorMachine ,SVR:SupportVectorRegressor, MDP:Markov Decision
Processes, ANN:Artificial Neural Networks, CNN:Convolutional Neural Network, MARS:Multivariate Adaptive Regression Splines,
GPR:Gaussian process regression, GNN:Graph Neural Network
Journal of Integrated Circuits and Systems, vol. 19, n. 2, 2024 7
a DT model to reduce latch redundancies and propose an op- ing detailed-route DRV. However, in advanced sub-microns,
timized latch placement solution. The approach significantly congestion map-based placement may leave significant de-
reduces clock skew and has a positive impact on placement, sign rule checks (DRC) violations to be addressed manu-
indirectly benefiting power consumption. ally or through iterating, making it a less reliable predictor
and potentially misleading the global router. Fig 9 illustrates
C..3 ML for CongestionRouting congestion is a criti-
the mismatch between actual DRC violations and congestion
cal factor that significantly affects the timing behavior and
map hotspots.
routeability. However, congestion is not always accurately
predicted from early placement stages, which misleads the In [36], the authors employ multiple learning models, in-
router and results in longer wires and routing detours. Tools cluding Linear Regression, Logistic Regression, and SVM
can restructure the logic and adjust functionality to mitigate classifiers, to reduce DRC violations during detailed rout-
routing congestion hotspots. In [33], the author introduces ing. Binary classifiers categorize globally routed cells based
a deep learning approach based on CNN to predict routing on whether they contribute to a DRC violation. The models
congestion hotspots on a pre-placed netlist. The model uti- were trained using features such as fan-in, fan-out, connec-
lizes various features, including the netlist graph, cell type, tivity parameters, pin proximity, local pin density, and lo-
function, pin count, geometry, and other cell characteristics cal overflow. The SVM model has successfully reduced the
for training. The ground truth during training was the routed DRC violations by an average of around 20%, with some
congestion map. The model employs Graph Attention Net- cases achieving up to a remarkable 76%, all without impact-
works (GAT) [65] and identifies common patterns in gate- ing design timing. Fig 10 illustrates a DRC hotspot detection
level netlists, helping to pinpoint the logic elements con- using the SVM model.
tributing to congestion. The model achieves 75% accuracy
in predicting congestion at lower metal layers, compared to
the baseline congestion map’s accuracy of 29%.
C..4 ML for Routing OptimizationAdvanced technology
nodes have raised new challenges in routeability. Numerous
factors, such as placement quality, timing constraints, and
aspect ratio, significantly influence the design routeability.
Fig. 9 Actual DRC vs. Congestion map DRC violations
A bad routeability results in excessive runtime, stretching
to weeks for large designs, and sometimes it ends up un-
routeable. Although the congestion map can aid in predict-
ing routeability, however, it may still prove insufficiency or
mislead the router. New research has focused on ML to pre-
dict placement solution routeability without fully performing
global or detailed routing.
In [34], the author developed SVM-based and Multivari-
ate Adaptive Regression Splines (MARS) models to predict
routeability from the placement stage. The models were Fig. 10 Actual DRC vs. model’s DRC hotspot predictions
trained using designs at 28nm and 45nm technology nodes
to predict the Pareto frontiers of utilization. After dividing In a similar context, a study conducted in [37] predicts de-
the layout into grids, various grid division features were ex- tailed routing violations from an early placement stage by es-
tracted, including pin density per grid area, and pin proxim- timating congested regions based on StWL estimations and
ity, cell count, net count, and edges count. The classification pin density, thus avoiding the need for a global router. The
model achieved a prediction accuracy of 85.9%, and 90.4% authors developed a binary classification model where the
for 45nm, and 28nm, respectively. Thus, surpassing the stan- output indicates the presence or absence of violations. The
dard prediction based on the congestion map, which barely model was trained on features extracted from the placement
achieved 61.7% and 73.5%. stage, targeting detailed routing shorts of already routed de-
In [35], the authors focus on predicting wirelength based signs. The implemented neural network model consists of
on the circuit’s power distribution network (PDN). An op- 20 nodes in one hidden layer. It has achieved an average
timized PDN network reduces wirelength, while an unopti- shorts prediction accuracy of 90%, within a reduced runtime
mized PDN can lead to inefficient placement of power rails compared to the standard congestion map method.
and vias, resulting in suboptimal wire routing. Inefficient Similar efforts to estimate routability and routing conges-
routing increases wirelength as signals take longer paths to tion from an early placement stage using supervised learning
avoid congested areas and routing obstacles caused by ineffi- have been conducted in [38] and [39] with the use of mul-
cient power networks. To mitigate this, the authors employ a tivariate adaptive regression splines models. The learning
Gaussian process regression (GPR) model, which considers framework aims to detect routing violations directly from the
relevant PDN attributes and placement features to reduce the placement stage without relying on a global router, resulting
total wirelength. in reliably accurate results and shorter runtime.
Congestion maps identify potential Design Rule violations Routability may also be impacted from the macro place-
(DRV) at the routing stage. These congestion maps aid in ment stage, particularly in large complex designs with high
optimizing placement by adjusting cell positions and reduc- macro and IP counts that occupy significant chip areas. In
8 ATTAOUI et al.: Machine Learning in VLSI Design: A Comprehensive Review.
[40], the authors propose a routability-driven macro place-
ment prediction using CNN to find the optimal macro-
placement with minimal DRC violations. The model fore-
casts design routeability for optimal macro-placement by ex-
ploring different configurations and evaluating wirelength,
power, and timing constraints. The CNN model is trained
using extracted features such as macro density map, pin den-
sity map, and connectivity density map. The CNN model has
reduced the DRV count and lowered the average total wire-
length, then it was integrated into the original macro place-
ment engine. Simulated annealing optimization was then ap-
plied to assess whether the resulting macro placement was Fig. 12 Model’s predictions of Delay on SI mode
near-optimal.
Signal integrity (SI) may also impact delays as it influ- C..5 ML for Physical VerificationMachine Learning is
ences the propagation of signals and overall timing perfor- also employed in detecting layout hotspots, a topic discussed
mance. SI effects create coupling capacitance due to the in depth in both [43] and [44]. Conventionally, hotspots
switching activity in neighboring nets which alters wire de- are detected using lithography commercial simulation tools.
lays and transition time (slew) in adjacent nets. Most EDA However, ML have remarkably enhanced the hotspot detec-
tools include a Static Timing Analysis (STA) engine with an tion accuracy while preserving short runtime. A lithogra-
SI mode, which introduces additional pessimism to the total phy simulator determines the good and bad layout samples.
delay based on aggressor and victim dependencies. How- These samples are used to train an SVM binary classifier to
ever, timing analysis with SI mode enabled can be time- highlight the boundary overlaps, as in Fig 13. Fig14 com-
consuming, especially for large designs. pares the model-predicted hotspots and the original lithogra-
In [41], the authors developed a model to predict transition phy simulation.
time, incremental delays, and path delays in SI mode. Fig 11
illustrates the incremental delay divergence in SI mode be-
tween a commercial tool and a signoff SI tool, with an in-
accuracy reaching 60ps. The training parameters include di-
verse design features, such as clock period, toggle rate, cou-
pling capacitance, resistance, aggressors count, differences
Fig. 13: ML flow for Layout hotspots detection through lithography simu-
in max-min arrival times, transition time, and incremental lation
delay in non-SI mode. The authors trained ANN and SVM
models using a 28nm technology library and combined the
predictions to obtain final values for incremental transition
time, incremental delay, and path delay in SI mode. The pre-
diction accuracy reduced the absolute error by 15.7%. Fig 12
shows the actual versus predicted incremental delays consid-
ering SI, with a worst-case absolute error of 5.2ps.
Fig. 14: Layout hotspots detection using lithography simulator vs. model
prediction

Lithography layout distortions may occur even if the de-


sign passes design rule checks (DRC). The design may still
contain layout hotspots that are sensitive to the lithographic
process. In a similar approach presented in [45], the authors
propose an accurate hotspot detection CNN-based frame-
work that applies a two-stage filter to identify these lay-
out hotspots. Compared to a state-of-the-art DRC tool, the
model’s result shows nearly 100% accuracy in a short run-
time.
Fig. 11 Delay Inaccuracy between SI and Non SI mode
C..6 ML for Power OptimizationOne of the challenges in
[42] presents an approach to estimating SI effects. The power and timing performance is the IR drop. IR drop may
framework uses autoencoder and anomaly detection (AD) slow down circuit timing behavior and lead to timing viola-
methods to uncover relevant features from the circuit out- tions. Conventionally, IR drop signoff analysis is performed
puts and to detect anomalies. The authors proposed a semi- at the end of the IC design flow, especially during engineer-
supervised LSAnomaly algorithm to identify anomalies in ing change orders (ECO), before tape out. The challenge in
the output signal waveform based on time-domain waveform fixing IR drop is that it cannot be fixed simultaneously, dur-
signals. ing the design phase. Resolving IR drop during the early
design phase might be challenging as the primary focus is
Journal of Integrated Circuits and Systems, vol. 19, n. 2, 2024 9
meeting timing, area, constraints, and routeability. However, C..7 ML for Timing OptimizationData mining was also
IR drop fixing is addressed mostly during ECO, using power employed to identify the timing discrepancies. The research
integrity analysis tools and simulations that help identify po- detailed in [66] uses the Support Vector Classifier (SVC) to
tential IR drop hotspots. In [46], the authors employed XG- uncover features that reveal timing inconsistencies between
Boost and CNN models to predict ground-bounce and dy- observed silicon timing and estimated timing provided by
namic cell IR-drop regions where IR-drop violations may commercial analysis tools. The research findings revealed
occur. Various features were extracted, including total path that a portion of predicted critical timing paths were found
resistance from the power pad to the cell, total cell power, to be non-critical in silicon. Conversely, numerous critical
peak and average current, toggle rate, load cell capacitance, paths in silicon were not flagged as critical during prediction.
cell type, and cell timing windows. The learning model revealed significant rules and identified
Similar research has explored the static IR-drop prediction paths that exhibit slower performance than estimated.
using Linear Regression in [47] and ANN in [48]. In [49], The study in [53] addresses the discrepancy in gate delay
the author introduced a clustering method that partitioned timing between single-input switching (SIS) and multi-input
the layout into multiple clusters, identifying areas with high switching (MIS) gates. The influence of MIS on gate de-
power density. This approach helps to prevent IR drops and lay calculations could lead to either delay overestimations
Electromigration (EM) noise at early design stages. The au- or underestimations. To mitigate the gate delay discrepan-
thors applied K-means, Mean Shift, and DBSCAN, to pre- cies, polynomial regression, SVR, and ANN models were
dict power-critical areas and prepare countermeasures for employed. The models were trained using gate load and
power hotspots. slew attributes, and then it was integrated into the timing li-
Voltage droop is a phenomenon that occurs momentarily brary to facilitate timing adjustments and tested on bench-
when a high current demand occurs in a portion of logic mark designs. The results obtained using the ANN model
gates, leading to timing violations as it impacts the rise and demonstrated a significant improvement in delay prediction
fall delays of cells. Voltage droop can potentially result in accuracy, as it has reduced misestimations from 120% in tra-
setup and hold time violations. It’s a common practice to ditional SIS to less than 3%. In Fig 16, the approach in-
include a static pessimism based on worst-case conditions volved extracting SIS and MIS for each standard cell using
to mitigate voltage droop, which often leads to unnecessary SPICE simulation. The SIS referred to commonly used delay
pessimism. In [50], an SVM model was developed to predict computation models like NLDM and CCS, while MIS repre-
real-time voltage droop. sented the golden multi-input switching models. The flow in
IR-drop and Electromigration effects need to be addressed Fig 16 feeds SIS, MIS, and MIS-SIS difference (MSD) to the
during the power grid (PG) design phase to prevent impact- ANN model. The ANN model aims to make SIS delay pre-
ing chip interconnects and power grid network failure. [51] dictions based on the gate’s input transitions (trA , trB ), load
aimed at minimizing the impact of IR drop and EM on the capacitance (CL ), and the temporal distance or skew between
chip’s power grid, the authors developed an ML approach for both input transitions (SA-B ). This work led to significant en-
predicting chip power grid design and creating an EM-aware hancement in timing accuracy at early design stages.
grid network. The model considers the current source coor-
dinates (x and y) and the metal line width and employs a neu-
ral network to predict the optimal widths of the metal lines in
the power network. Once the optimal widths are determined,
the model generates an IR drop map. The predicted IR drop
map closely matches the map obtained from the conventional
approach, as illustrated in Fig 15. The model also achieved a
significantly faster runtime of up to 6 times compared to the
conventional approach.

Fig. 16 MIS and SIS delay convergence using ANN

Fig. 15 Actual IR drop vs. model prediction


In [54], the focus was on the pre-silicon and post-silicon
timing correlation. The author extracted the post-silicon path
A similar approach in [52] predicts the on-chip power grid delay testing (PDT) and their corresponding estimated pre-
network aging using an EM-aware model. The author in- silicon delays from a static timing analyzer. Eq.6 describes
P
volved a neural network-based regression and a logistic re- aPpath-based cell and net timing differences, where ci and
gression classifier to identify potential EM-affected metal ni express the total cell and net delay and αc and αn are
segments within the PG network. the correlation coefficients. This study encompassed 240 dif-
ferent designs and revealed a noticeable over-pessimism in
10 ATTAOUI et al.: Machine Learning in VLSI Design: A Comprehensive Review.
the pre-silicon STA estimations, as depicted in Fig 17. To
address this issue, the author proposed a binary SVM clas-
sifier with a linear kernel to rank delays based on their de-
viations. In Fig 17, the histogram illustrates the distribution
of path delay differences and the model’s predictions for cell
uncertainties.

X X X X
αc ∗ ci = c′i ; αn ∗ ni = n′i (6)
Fig. 18: Slack discrepancies between signoff tools T1 and T2, and commer-
• αc , αn : Cell/Net correlation factors on critical path. cial design implementation tool D1 vs. T1
• ci , ni : Post-silicon (PDT) measured cell/net delays.
• c′i , n′i : Pre-silicon (STA) estimated cell/net delays.

Fig. 19: Slack discrepancies at pre-pouting stage for a 2ns Clock Period
Design

scenario where slack surpasses the clock period, leading to


Fig. 17: Discrepancies in path delays between Post-Silicon and Pre-Silicon
an over-pessimism and over-design.
estimations
In [58], the focus was on aligning net delays and transi-
Multiple factors can cause timing discrepancies, including tion time (Slew) using learning models. The research aimed
circuit implementation, process, environment variations, and to converge estimated wire and slew delays to wire and slew
tool uncertainties. It becomes critical to pinpoint the factors delays obtained from the signoff STA tool. Consequently,
contributing to the timing differences. In [55], the focus was the proposed methodology reduces accumulated timing er-
identifying timing path variations between estimated and ob- rors and maintains the correlation between wire and slew
served. To accomplish this, the author employed a statis- delays. [59] investigated the Graph-based timing analysis
tical regression approach using support vector ϵ-insensitive (GBA) and Path-based timing analysis (PBA) convergence.
regression (ϵ-SVR) to rank the feature’s importance in terms Employing the RF algorithm, the study aimed to align PBA
of their impact on path timing deviations. The model con- estimated arrival times and slack to GBA values. The model
sidered a wide range of features, including cell and net at- implementation has effectively decreased PBA-GBA timing
tributes, as well as layout-related features such as the number pessimism within a brief inference time.
of vias in the path, transistor types, location-related param- In [60], a learning-based method identifies memory tim-
eters, and dynamic environment characteristics such as tem- ing errors from the pre-placement stage. The research uses
perature, power grid behavior, voltage droop, and switching Lasso, SVM, Gradient Boosting, and ANN models to pre-
activity. This analysis aimed to shed light on the root causes dict post-layout slack from early design phases. The author
of timing mismatches and improve the overall understanding in [61] tries to improve the circuit switching power accuracy
of circuit performance. using LR, RF, and DT models. The methodology trains mod-
The study in [56] aimed to improve timing correlation in els using multiple runs with back-annotated RC parasitics
setup time, cell delays, wire delays, stage delays, and end- extracted from Standard Parasitic Exchange Format (SPEF)
points slack between an implementation tool and a signoff files. The study developed a ”Spefless flow” capable of pre-
tool, in 28nm and 45nm technology designs. The approach dicting a cell’s switching power regardless of the SPEF file.
employed ANN, SVM Regressor, and RF. Fig18 depict tim- [62] improves pre-routing net delay and pin arrival times
ing discrepancies that reach 110ps between two commercial without relying on the routing engine. The author presents
signoff timing tools, T1 and T2, and path slack discrepancies a Graph Neural Networks (GNN) model to rectify endpoint
of 100ps between a signoff timing tool and a commercial de- arrival times and slack at the pre-route stage. This study is
sign implementation tool T1 and D1. conducted on 21 benchmark designs of 130nm. In [63], the
The study in [57] presents a pre-routing timing conver- author investigates the LR, ANN, RF, and XGBoost to pre-
gence model, employing Neural Networks and Random For- dict crosstalk-prone nets, aiming to enhance design routabil-
est to reduce net delay pessimism at the pre-routing phase. ity and timing accuracy during the routing phase. The study
Fig 19 illustrates a pre-routing slack estimation for a design encompasses 12 benchmark designs at the 7nm scale, rang-
operating at 2ns clock period. The red line indicates an ideal ing from 3,500 to 74,000 instances. For model training,
estimation of pre-routing slack delays using a sign-off tim- 20 net features were employed, covering aspects such as
ing analysis. The figure highlights a significant worst-case net topology, wire delay, output slew, and both electrical
Journal of Integrated Circuits and Systems, vol. 19, n. 2, 2024 11
and logical characteristics, including source and sink ca- As future directives and key contributions, we suggest:
pacitance, fan-in, fan-out, wirelength, congestion, as well
as neighboring nets’ drive strength. Among the models • Involving diverse technology nodes on the same train-
assessed, XGBoost exhibited the highest accuracy, achiev- ing set: this emphasizes the importance of devel-
ing 91.12% accuracy in predicting coupling capacitance, oping models capable of accommodating different
84.63% in crosstalk noise, and 87.56% in incremental delay. cutting-edge technology nodes within the same learn-
ing framework. This approach intends to enhance the
IV. R ESULTS A NALYSIS AND D ISCUSSION models’ adaptability and accuracy.
In Table I, we summarize the state-of-the-art works of ML • Incorporating large-scale industrial designs with di-
applications in VLSI and CAD. We extracted relevant in- verse internal architectures: during training, models
formation such as the implemented models, the related PnR have to learn and adapt to various design architectures
stage, and the design technology node employed during the to reflect real-world scenarios. This exposure to di-
training process. Our goal is to highlight the main findings verse implementations enhances models’ adaptability
from the literature review, identify gaps and limitations or and improves the generalization.
weaknesses in the existing literature, and propose future di- • Multi-frequency runs: one crucial aspect often over-
rectives. looked and which could significantly enhance mod-
Most studies have implemented supervised learning mod- els’ accuracy and generalization, is conducting de-
els due to their reliance on labeled data. Regression and sign runs at various clock frequencies. Many studies
classification algorithms stand out as extensively applied, have neglected to incorporate running designs across
especially tree-based models Random Forest and Decision a spectrum of clock frequencies. By including fre-
Trees, along with Support Vector Regressor and Gradient- quency as a feature in the training dataset, the model
based models. However, there is an evident surge in deep gains exposure to the behavior at different speeds.
learning and Neural Networks applications. Most studies This offers an opportunity to enrich the model’s learn-
tend to adopt Artificial Neural Networks (ANNs), Convo- ing dataset.
lutional Neural Networks (CNNs), and Graph Neural Net-
• Use of relevant features: a varied training feature set
works (GNNs).
captures various aspects of circuit behavior and char-
Although significant progress has been made to enhance
acteristics. This broad coverage enhances prediction
design placement, verification, timing, and power accuracy
accuracy.
at PnR early stages, the majority of prior studies have fo-
cused on small and medium-sized designs, mainly within • Hyperparameter tuning. Tuning hyperparameters in-
higher technology nodes like 28nm, 45nm, and higher. volves meticulous adjustments to find the most effec-
There have been limited works, such as those referenced in tive configuration, ensuring the model operates at its
[36][48][53][63], and [61], that use 16nm, 15nm, and 7nm peak performance. This procedure identifies the best
design technologies, thus representing a minority. hyperparameters that enhance the overall accuracy.
Previous studies have consistently demonstrated the su-
V. C ONCLUSION
perior predictive accuracy of tree-based methodologies and
neural networks. Nevertheless, these investigations often en- Machine learning presents promising solutions to enhance
countered limitations by relying on a limited set of features, the VLSI design and EDA. As technology progresses, fol-
typically around 4 to 20 input parameters. This restriction lowing Moore’s law and the reduction in transistor sizes,
in feature selection might have a repercussion on the overall the increase in design complexity continues. The integra-
accuracy. tion of Learning frameworks has advanced EDA tools re-
In this review, we evaluated the quality of data used for the sulting in considerable enhancements across various appli-
training process. The findings revealed that previous studies cations. These advancements have contributed to improved
tend to use a limited selection of design architectures. This chip performance, detection of power leaks, optimization of
limitation restricts the scope of training data, which fails chip area, reduction in power consumption, identification of
to achieve robust generalization capabilities, while models potential fabrication defects, and improved QoR accuracy.
need exposure to varied design architectures during the train- After conducting our review, we have identified certain
ing phase. Consequently, when subjected to new, unseen limitations. Earlier studies were constrained by reliance
data, these models exhibit a lower ability to generalize, re- on small to medium-sized designs within higher technology
sulting in reduced overall accuracy. nodes like 28nm and 45nm. The incorporation of large-scale
Another crucial aspect lies in tuning hyperparameters industrial designs enriches model learning, improving gen-
which is often overlooked in many studies. It is essential eralization. Additionally, we suggest conducting design runs
to optimize further the model’s performance. By neglecting at varied clock frequencies to exploit data at different oper-
hyperparameter fine-tuning, models may not achieve their ational speeds with a rich set of input features and hyperpa-
maximum potential. Most studies have either omitted or did rameter tuning.
not mention due to space limitations. In addition, various Our ongoing endeavor involves NN application leveraging
studies use a reduced set of features to train models. The an extensive dataset of multiple cutting-edge 7nm and 16nm
inclusion of multiple features enhances the prediction accu- technology designs with around 100 extracted features. This
racy depending on the feature’s importance and contribution initiative aims to enhance timing prediction accuracy at the
to the predictions. pre-placement stage. This forthcoming project is built upon
12 ATTAOUI et al.: Machine Learning in VLSI Design: A Comprehensive Review.
our previous work referenced in [67]. Addressing these con- [19] W. Chen, N. Sumikawa, L.-C. Wang, J. Bhadra, X. Feng, and M. S.
tributions in future research endeavors will potentially en- Abadir, “Novel test detection to improve simulation efficiency: a com-
mercial experiment,” in Proceedings of the International Conference
hance model accuracy and adaptability.
on Computer-Aided Design, 2012, pp. 101–108.
R EFERENCES [20] W. Chen, L.-C. Wang, J. Bhadra, and M. Abadir, “Simulation knowl-
[1] M. Pandey, “Machine learning and systems for building the next gen- edge extraction and reuse in constrained random processor verifica-
eration of eda tools,” in 2018 23rd Asia and South Pacific Design Au- tion,” in Proceedings of the 50th Annual Design Automation Confer-
tomation Conference (ASP-DAC). IEEE, 2018, pp. 411–415. ence, 2013, pp. 1–6.

[21] H.-Y. Liu and L. P. Carloni, “On learning-based methods for design-
[2] A. B. Kahng, “Machine learning for cad/eda: the road ahead,” IEEE
space exploration with high-level synthesis,” in Proceedings of the
Design & Test, vol. 40, no. 1, pp. 8–16, 2022.
50th annual design automation conference, 2013, pp. 1–7.
[3] A. C. Müller and S. Guido, Introduction to machine learning with [22] B. Ozisikyilmaz, G. Memik, and A. Choudhary, “Efficient system de-
Python: a guide for data scientists. ” O’Reilly Media, Inc.”, 2016. sign space exploration using machine learning techniques,” in Pro-
ceedings of the 45th annual design automation conference, 2008, pp.
[4] G. Dong and H. Liu, Feature engineering for machine learning and 966–969.
data analytics. CRC press, 2018.
[23] A. Mahapatra and B. C. Schafer, “Machine-learning based simulated
[5] A. Zheng and A. Casari, Feature engineering for machine learning: annealer method for high level synthesis design space exploration,” in
principles and techniques for data scientists. ” O’Reilly Media, Inc.”, Proceedings of the 2014 Electronic System Level Synthesis Conference
2018. (ESLsyn). IEEE, 2014, pp. 1–6.

[6] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, [24] R. G. Kim, J. R. Doppa, and P. P. Pande, “Machine learning for design
O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg et al., space exploration and optimization of manycore systems,” in 2018
“Scikit-learn: Machine learning in python,” the Journal of machine IEEE/ACM International Conference on Computer-Aided Design (IC-
Learning research, vol. 12, pp. 2825–2830, 2011. CAD). IEEE, 2018, pp. 1–6.

[7] L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa, A. Mueller, [25] T.-C. Chen, P.-Y. Lee, and T.-C. Chen, “Automatic floorplanning for ai
O. Grisel, V. Niculae, P. Prettenhofer, A. Gramfort, J. Grobler, R. Lay- socs,” in 2020 International Symposium on VLSI Design, Automation
ton, J. VanderPlas, A. Joly, B. Holt, and G. Varoquaux, “API de- and Test (VLSI-DAT). IEEE, 2020, pp. 1–2.
sign for machine learning software: experiences from the scikit-learn
[26] A. Mirhoseini, A. Goldie, M. Yazgan, J. Jiang, E. Songhori, S. Wang,
project,” in ECML PKDD Workshop: Languages for Data Mining and
Y.-J. Lee, E. Johnson, O. Pathak, S. Bae et al., “Chip placement
Machine Learning, 2013, pp. 108–122.
with deep reinforcement learning,” arXiv preprint arXiv:2004.10746,
2020.
[8] T. Hastie, R. Tibshirani, J. H. Friedman, and J. H. Friedman, The el-
ements of statistical learning: data mining, inference, and prediction. [27] C.-K. Lee, “Deep learning creativity in eda,” in 2020 International
Springer, 2009, vol. 2. Symposium on VLSI Design, Automation and Test (VLSI-DAT). IEEE,
2020, pp. 1–1.
[9] A. Géron, Hands-on machine learning with Scikit-Learn, Keras, and
TensorFlow. ” O’Reilly Media, Inc.”, 2022. [28] Y. Ma, H. Ren, B. Khailany, H. Sikka, L. Luo, K. Natarajan, and B. Yu,
“High performance graph convolutional networks with applications in
[10] S. Rogers and M. Girolami, A first course in machine learning. Chap- testability analysis,” in Proceedings of the 56th Annual Design Au-
man and Hall/CRC, 2016. tomation Conference 2019, 2019, pp. 1–6.

[11] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduc- [29] Y. Lin, S. Dhar, W. Li, H. Ren, B. Khailany, and D. Z. Pan, “Dream-
tion. MIT press, 2018. place: Deep learning toolkit-enabled gpu acceleration for modern vlsi
placement,” in Proceedings of the 56th Annual Design Automation
[12] A. M. Wichert and L. Sa-Couto, Machine Learning-A Journey to Deep Conference 2019, 2019, pp. 1–6.
Learning: With Exercises and Answers. World Scientific, 2021.
[30] B. Yu, D. Z. Pan, T. Matsunawa, and X. Zeng, “Machine learning
[13] D. Amuru, A. Zahra, H. V. Vudumula, P. K. Cherupally, S. R. Gurram, and pattern matching in physical design,” in The 20th Asia and South
A. Ahmad, and Z. Abbas, “Ai/ml algorithms and applications in vlsi Pacific design automation conference. IEEE, 2015, pp. 286–293.
design and technology,” Integration, 2023.
[31] S. Ward, D. Ding, and D. Z. Pan, “Pade: A high-performance placer
with automatic datapath extraction and evaluation through high di-
[14] S. Saini, K. Lata, and G. Sinha, VLSI and Hardware Implementations
mensional data learning,” in Proceedings of the 49th Annual Design
Using Modern Machine Learning Methods. CRC Press, 2021.
Automation Conference, 2012, pp. 756–761.
[15] P. A. Beerel and M. Pedram, “Opportunities for machine learning in [32] S. I. Ward, N. Viswanathan, N. Y. Zhou, C. C. Sze, Z. Li, C. J. Alpert,
electronic design automation,” in 2018 IEEE International Symposium and D. Z. Pan, “Clock power minimization using structured latch tem-
on Circuits and Systems (ISCAS). IEEE, 2018, pp. 1–5. plates and decision tree induction,” in 2013 IEEE/ACM International
Conference on Computer-Aided Design (ICCAD). IEEE, 2013, pp.
[16] A. Malhotra and A. Singh, “Implementation of ai in the field of vlsi: A
599–606.
review,” in 2022 Second International Conference on Power, Control
and Computing Technologies (ICPC2T). IEEE, 2022, pp. 1–5. [33] R. Kirby, S. Godil, R. Roy, and B. Catanzaro, “Congestionnet: Rout-
ing congestion prediction using deep graph neural networks,” in 2019
[17] L. Wang and M. Luo, “Machine learning applications and opportu- IFIP/IEEE 27th International Conference on Very Large Scale Inte-
nities in ic design flow,” in 2019 international symposium on VLSI gration (VLSI-SoC). IEEE, 2019, pp. 217–222.
design, automation and test (VLSI-DAT). IEEE, 2019, pp. 1–3.
[34] W.-T. J. Chan, Y. Du, A. B. Kahng, S. Nath, and K. Samadi, “Beol
[18] A. B. Kahng, “Machine learning applications in physical design: Re- stack-aware routability prediction from placement using data mining
cent results and directions,” in Proceedings of the 2018 international techniques,” in 2016 IEEE 34th international conference on computer
symposium on physical design, 2018, pp. 68–73. design (ICCD). IEEE, 2016, pp. 41–48.
Journal of Integrated Circuits and Systems, vol. 19, n. 2, 2024 13
[35] W.-H. Chang, L.-D. Chen, C.-H. Lin, S.-P. Mu, M. C.-T. Chao, C.- [49] H. Dhotre, S. Eggersglüß, and R. Drechsler, “Identification of efficient
H. Tsai, and Y.-C. Chiu, “Generating routing-driven power distribu- clustering techniques for test power activity on the layout,” in 2017
tion networks with machine-learning technique,” in Proceedings of IEEE 26th Asian Test Symposium (ATS). IEEE, 2017, pp. 108–113.
the 2016 on International Symposium on Physical Design, 2016, pp.
145–152. [50] F. Ye, F. Firouzi, Y. Yang, K. Chakrabarty, and M. B. Tahoori,
“On-chip voltage-droop prediction using support-vector machines,” in
[36] W.-T. J. Chan, P.-H. Ho, A. B. Kahng, and P. Saxena, “Routability 2014 IEEE 32nd VLSI Test Symposium (VTS). IEEE, 2014, pp. 1–6.
optimization for industrial designs at sub-14nm process nodes using
machine learning,” in Proceedings of the 2017 ACM on International [51] S. Dey, S. Nandi, and G. Trivedi, “Powerplanningdl: Reliability-aware
Symposium on Physical Design, 2017, pp. 15–21. framework for on-chip power grid design using deep learning,” in
2020 Design, Automation & Test in Europe Conference & Exhibition
[37] A. F. Tabrizi, N. K. Darav, S. Xu, L. Rakai, I. Bustany, A. Kennings, (DATE). IEEE, 2020, pp. 1520–1525.
and L. Behjat, “A machine learning framework to identify detailed
routing short violations from a placed netlist,” in Proceedings of the [52] ——, “Machine learning for vlsi cad: A case study in on-chip power
55th Annual Design Automation Conference, 2018, pp. 1–6. grid design,” in 2021 IEEE Computer Society Annual Symposium on
VLSI (ISVLSI). IEEE, 2021, pp. 378–383.
[38] Z. Qi, Y. Cai, and Q. Zhou, “Accurate prediction of detailed routing
congestion using supervised data learning,” in 2014 IEEE 32nd inter- [53] O. S. Ram and S. Saurabh, “Modeling multiple-input switching
national conference on computer design (ICCD). IEEE, 2014, pp. in timing analysis using machine learning,” IEEE Transactions on
97–103. Computer-Aided Design of Integrated Circuits and Systems, vol. 40,
no. 4, pp. 723–734, 2020.
[39] Q. Zhou, X. Wang, Z. Qi, Z. Chen, Q. Zhou, and Y. Cai, “An accurate
detailed routing routability prediction model in placement,” in 2015 [54] L.-C. Wang, P. Bastani, and M. S. Abadir, “Design-silicon timing cor-
6th Asia Symposium on Quality Electronic Design (ASQED). IEEE, relation: A data mining perspective,” in Proceedings of the 44th an-
2015, pp. 119–122. nual Design Automation Conference, 2007, pp. 384–389.

[55] P. Bastani, N. Callegari, L.-C. Wang, and M. S. Abadir, “Statistical


[40] Y.-H. Huang, Z. Xie, G.-Q. Fang, T.-C. Yu, H. Ren, S.-Y. Fang,
diagnosis of unmodeled systematic timing effects,” in Proceedings of
Y. Chen, and J. Hu, “Routability-driven macro placement with em-
the 45th annual Design Automation Conference, 2008, pp. 355–360.
bedded cnn-based prediction model,” in 2019 Design, Automation &
Test in Europe Conference & Exhibition (DATE). IEEE, 2019, pp.
[56] S.-S. Han, A. B. Kahng, S. Nath, and A. S. Vydyanathan, “A
180–185.
deep learning methodology to proliferate golden signoff timing,” in
2014 Design, Automation & Test in Europe Conference & Exhibition
[41] A. B. Kahng, M. Luo, and S. Nath, “Si for free: machine learn-
(DATE). IEEE, 2014, pp. 1–6.
ing of interconnect coupling delay and transition effects,” in 2015
ACM/IEEE International Workshop on System Level Interconnect Pre-
[57] E. C. Barboza, N. Shukla, Y. Chen, and J. Hu, “Machine learning-
diction (SLIP). IEEE, 2015, pp. 1–8.
based pre-routing timing prediction with reduced pessimism,” in 2019
56th ACM/IEEE Design Automation Conference (DAC). IEEE, 2019,
[42] R. Medico, D. Spina, D. V. Ginste, D. Deschrijver, and T. Dhaene,
pp. 1–6.
“Machine-learning-based error detection and design optimization in
signal integrity applications,” IEEE Transactions on Components, [58] A. B. Kahng, S. Kang, H. Lee, S. Nath, and J. Wadhwani, “Learning-
Packaging and Manufacturing Technology, vol. 9, no. 9, pp. 1712– based approximation of interconnect delay and slew in signoff timing
1720, 2019. tools,” in 2013 ACM/IEEE International Workshop on System Level
Interconnect Prediction (SLIP). IEEE, 2013, pp. 1–8.
[43] L.-C. Wang, “Experience of data analytics in eda and test—principles,
promises, and challenges,” IEEE Transactions on Computer-Aided [59] A. B. Kahng, U. Mallappa, and L. Saul, “Using machine learning to
Design of Integrated Circuits and Systems, vol. 36, no. 6, pp. 885– predict path-based slack from graph-based timing analysis,” in 2018
898, 2016. IEEE 36th International Conference on Computer Design (ICCD).
IEEE, 2018, pp. 603–612.
[44] D. Ding, J. A. Torres, and D. Z. Pan, “High performance lithography
hotspot detection with successively refined pattern identifications and [60] W.-T. J. Chan, K. Y. Chung, A. B. Kahng, N. D. MacDonald, and
machine learning,” IEEE Transactions on Computer-Aided Design of S. Nath, “Learning-based prediction of embedded memory timing fail-
Integrated Circuits and Systems, vol. 30, no. 11, pp. 1621–1634, 2011. ures during initial floorplan design,” in 2016 21st Asia and South Pa-
cific Design Automation Conference (ASP-DAC). IEEE, 2016, pp.
[45] Y.-T. Yu, Y.-C. Chan, S. Sinha, I. H.-R. Jiang, and C. Chiang, “Accu- 178–185.
rate process-hotspot detection using critical design rule extraction,” in
Proceedings of the 49th Annual Design Automation Conference, 2012, [61] M. Chentouf, C. Naimy, and Z. E. A. A. Ismaili, “Machine learning
pp. 1167–1172. application for early power analysis accuracy improvement: A case
study for cells switching power,” in 2021 International Conference on
[46] Y.-C. Fang, H.-Y. Lin, M.-Y. Sui, C.-M. Li, and E. J.-W. Fang, Microelectronics (ICM). IEEE, 2021, pp. 17–20.
“Machine-learning-based dynamic ir drop prediction for eco,” in 2018
IEEE/ACM International Conference on Computer-Aided Design (IC- [62] Z. Guo, M. Liu, J. Gu, S. Zhang, D. Z. Pan, and Y. Lin, “A timing
CAD). IEEE, 2018, pp. 1–7. engine inspired graph neural network model for pre-routing slack pre-
diction,” 2022.
[47] Y. Yamato, T. Yoneda, K. Hatayama, and M. Inoue, “A fast and ac-
curate per-cell dynamic ir-drop estimation method for at-speed scan [63] R. Liang, Z. Xie, J. Jung, V. Chauha, Y. Chen, J. Hu, H. Xiang, and G.-
test pattern validation,” in 2012 IEEE International Test Conference. J. Nam, “Routing-free crosstalk prediction,” in 2020 IEEE/ACM Inter-
IEEE, 2012, pp. 1–8. national Conference On Computer Aided Design (ICCAD). IEEE,
2020, pp. 1–9.
[48] S.-Y. Lin, Y.-C. Fang, Y.-C. Li, Y.-C. Liu, T.-S. Yang, S.-C. Lin, C.-M.
Li, and E. J.-W. Fang, “Ir drop prediction of eco-revised circuits using [64] L.-C. Wang and M. S. Abadir, “Data mining in eda-basic principles,
machine learning,” in 2018 IEEE 36th VLSI Test Symposium (VTS). promises, and constraints,” in Proceedings of the 51st Annual Design
IEEE, 2018, pp. 1–6. Automation Conference, 2014, pp. 1–6.
14 ATTAOUI et al.: Machine Learning in VLSI Design: A Comprehensive Review.
[65] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, Y. Bengio
et al., “Graph attention networks,” stat, vol. 1050, no. 20, pp. 10–
48 550, 2017.

[66] J. Chen, B. Bolin, L.-C. Wang, J. Zeng, D. Drmanac, and M. Mateja,


“Mining ac delay measurements for understanding speed-limiting
paths,” in 2010 IEEE International Test Conference. IEEE, 2010,
pp. 1–10.

[67] Y. Attaoui, M. Chentouf, Z. E. A. A. Ismaili, and A. El Mourabit,


“Machine learning application for cell delay accuracy improvement at
post-placement stage: A case study for combinational cells,” Integra-
tion, 2023.

You might also like