Machine Learning in VLSI Design: A Comprehensive Review
Machine Learning in VLSI Design: A Comprehensive Review
2, 2024 1
    Abstract— state-of-the-art ML applications and aspects of      to explore new methodologies, particularly in handling sig-
AI in VLSI/CAD EDA. As VLSI chip complexity increases,             nificant amounts of data. ML emerges as a solution to sur-
driven by shrinking chip sizes and higher performance de-          pass these predictability limitations. Through training on
mands, manufacturing high-performance chips poses signifi-         massive datasets, ML-based models may offer high-accuracy
cant challenges. Recent studies have investigated the use of Ar-   predictions for Quality of Results (QoR) as in Fig 2, which
tificial Intelligence and Machine Learning in VLSI CAD/EDA         depicts the potential impact of ML predictions on modifying
(Computer-Aided Design/Electronic Design Automation). Ma-          the trade-off curve between cost and accuracy [2].
chine Learning is experiencing a rapid growth in VLSI design
and is being increasingly integrated into EDA tool development
due to its capacity for achieving higher accuracy in reduced
runtime. This paper offers a comprehensive review of the lat-
est state-of-the-art ML applications in VLSI/CAD EDA.
                      I.   I NTRODUCTION
   As ASIC development grows in complexity, manufactur-
ing constraints emerge, influencing both cost and time-to-                      Fig. 2 ML Accuracy improvement versus Cost/Runtime
market. The escalating number of physical features in VLSI
circuits presents a serious challenge. This growth results in         This paper provides a comprehensive exploration of ML in
a massive flux of data processed by EDA tools’ engines[1].         EDA and VLSI design. Our focus involved reviewing sev-
Fig 1 illustrates the data growth across technology advance-       eral state-of-the-art ML applications and outlining the key
ments, complicating the task of gathering and identifying un-      challenges and limitations that researchers encounter. The
derlying data correlations and patterns. Consequently, this        scope of this review spans from high-level synthesis to post-
surge in data directly impacts costs and extends the develop-      layout verification of ASIC flow. We close the review by
ment runtime.                                                      identifying potential promises and future directives. The re-
                                                                   mainder of this paper is structured as follows. Section 2 pro-
                                                                   vides an overview of the fundamentals of ML and Neural
                                                                   network. Section 3 offers a literature review, this section is
                                                                   divided into multiple subsections based on different parts of
                                                                   the ASIC flow. Section 4, presents a results discussion and
                                                                   offers insights into future perspectives.
                                                                     A.     AI/ML Role
           Fig. 1 EDA Data capacity vs. chip technology node          AI frameworks consist of data-driven models able to pro-
                                                                   cess, perceive, and interpret the environment, engage in rea-
   Recently, Machine Learning (ML) became the focal point          soning, and make informed decisions. The AI comprises
and is extensively applied in VLSI design. It has empow-           several sub-fields, among which are Machine Learning and
ered designers with the capability to discover patterns and        Deep Learning. ML focuses on creating models that improve
functions within complex unexplored data, and provides the         their performance by learning from data using predefined al-
opportunity to gain more knowledge on circuit behavior. As         gorithms, aiming to discern patterns and correlations within
we advance toward physical layout implementation, more             datasets. The model acquires knowledge through training
detailed and accurate physical information becomes acces-          and can subsequently generate predictions for new unseen
sible. However, there is a point where metrics settle, and         data. ML techniques are classified into three primary sub-
EDA heuristics face limitations due to excessive pessimism.        sets: Supervised, Unsupervised, and Reinforcement Learn-
This presents a challenge for EDA companies, urging them           ing.
  E.    Reinforcement: Learning from Experience                              • The Step function produces binary outputs deter-
                                                                               mined by a predefined threshold. It operates as a bi-
   Reinforcement Learning (RL) stands apart from super-
                                                                               nary function, yielding an output of ’1’ when the in-
vised and unsupervised learning. It is an interactive learn-
                                                                               put exceeds ’0’ and ’-1’ otherwise.
ing paradigm where an agent interacts with the environment
                                                                             • The Sigmoid function maps the weighted sum to the
and makes decisions to achieve specific predictions. The
                                                                               range ’0;1’. It’s often used in binary classification.
agent receives feedback in the form of rewards or punish-
                                                                               The sigmoid smoothly transforms input values into a
ments based on its prediction choices, and it learns to op-
                                                                               probability-like output. It maps large negative inputs
timize its strategy over time to maximize cumulative re-
                                                                               to ’0’ and large positive inputs to ’1’.
wards. Several prominent Reinforcement algorithms in-
clude Q-Learning, Policy Gradient Methods, Markov De-                        • The Hyperbolic Tangent (tanh) is similar to the sig-
cision Processes (MDP), and Monte Carlo Tree Search                            moid. However, the output ranges from ’-1’ to ’1’.
(MCTS).[11][10]                                                                It maps the weighted sum to -1;1, offering a centered
                                                                               output.
  F.    Deep Learning and Neural Networks                                    • The Rectified Linear Unit (ReLU) activation function
   Neural Networks (NN) can be adapted for supervised, un-                     outputs the weighted sum if the result is positive and
supervised, and reinforcement learning. NNs have shown re-                     ’0’ otherwise.
markable performance in handling complex data structures.
In this section, we introduce a single neuron basic building                                    (
block structure and the architecture of a vanilla feed-forward                                    1      if x > 0
neural network. We provide an overview of the concept of                       Step: f (x) =
                                                                                                  −1     if x ≤ 0
Backpropagation and review commonly used Deep Neural
Networks (DNNs).                                                                                   1
                                                                           Sigmoid: f (x) =                                         (2)
                                                                                                1 + e−x
  F..1 PerceptronA neuron or perceptron, is the funda-                                                         x    −x
                                                                                                              e −e
mental component in neural networks that processes input                       tanh:   f (x) = tanh(x) =
and makes decisions. It assigns weights to its multi-input                                                    ex + e−x
values based on its importance, then linearly combines the                   ReLU:     f (x) = max(0, x)
weighted inputs and introduces a bias term. The result un-
dergoes a non-linearity through an activation function. In                 Equation eq.3 represents the output Y after being passed
Fig 4, multiple inputs represent features, denoted as x1, x2,...,       through the activation function, where Yin is the result from
xn. Each input has an associated weight or importance to the            eq.1 (Fig 4). The Sigmoid function then constrains the result
output. (eq.1)[12]                                                      between ’0’ and ’1’.
            n                                                                         1
            X                                                              Y =                                                      (3)
   Y in =         (xi · wi ) + Bo                                 (1)            1 + e−(Y in)
            i=1
4                                                      ATTAOUI et al.: Machine Learning in VLSI Design: A Comprehensive Review.
                                                                   gradient descent of the cost. This process is the Backprop-
                                                                   agation and allows the adjustment of weights and biases for
                                                                   all neurons to minimize the final cost. Some common loss
                                                                   functions are Mean Squared Error (MSE) for regression or
                                                                   Cross-entropy for classification.
                                                                      F..4 BackpropagationThe Backpropagation is funda-
                                                                   mental for weights and bias updates. The network calcu-
                                                                   lates the cost gradient for each weight and bias in the output
                                                                   layer and backpropagates to adjust all weights and biases.
                                                                   The gradient reflects how small or large the weight and bias
                 Fig. 5 Common activation functions                changes affect the cost function, and points the direction for
                                                                   weight adjustments to minimize the error. The weight up-
                                                                   date is the subtraction of the learning rate multiplied by the
   F..2 MultilayerA Multilayer Perceptron (MLP) is the             gradient eq.5. The learning rate controls the step size of the
fundamental Artificial Neural Network (ANN). It comprises          weight updates and influences convergence speed and train-
multiple layers of interconnected neurons. In Fig 6, the net-      ing stability.[12]
work is a Feedforward neural network (FNN) of three hid-
den layers. The network adjusts weights during training. and
                                                                                    ∂E
each neuron in the hidden layer receives the results of all pre-      ∆wij = −η                                              (5)
vious layer neurons, processes the linear weighted combina-                        ∂wij
tion, and passes the results through the activation function.
[12]                                                                 F..5 Deep LearningA Deep Neural Network (DNN) is
                                                                  a feed-forward network with multiple hidden layers trained
                                                                  using backpropagation. Convolutional Neural Networks
                                                                  (CNNs) are tailored for grid-like data, such as image clas-
                                                                  sification and object detection, to detect local patterns. Re-
                                                                  current Neural Networks (RNNs) are specialized for sequen-
                                                                  tial data, where the element order is significant. They utilize
                                                                  recurrent connections to manage and update hidden states
                                                                  and are valuable in Natural Language Processing (NLP)
                                                                  and speech recognition. The Long Short-Term Memory Net-
                 Fig. 6 Feedforwad Neural Network
                                                                  works (LSTMs) are a specialized type of RNN designed
                                                                  to overcome the vanishing gradient problem and capture
   Eventually, the Output layer represents the predicted tar-     long-range dependencies in sequential data. Gated Recur-
gets variables, either for regression tasks or classes in classi- rent Unit Networks (GRUs) are also RNNs, with a sim-
fication tasks. The output layer may comprise multiple neu-       pler architecture, offering efficiency for tasks like natural
rons, each corresponding to a specific class. The matrix in       language understanding and speech synthesis. Lastly, Au-
eq.4 represents the output values (Y ) for m hidden neuron.       toencoders  are networks used for unsupervised learning and
                                                                  dimensionality reduction, mapping input data to a lower-
                                                                  dimensional representation and a decoder network recon-
    Y = σ (X · W + b)                                         (4) structing the input.[12]
                                                             Even though deep learning has been around for decades,
        Y1            x1       w11 w12 . . . w1m             b1   the hardware support for neural networks has only recently
       Y2          x2   w21 w22 . . . w2m   b2  come to realization. The development of accelerated parallel
       ..  = σ( ..  .  ..                     ..  +  .. ) computing CPU-GPU architectures has made deep learning
                                                   
                                        ..    ..
       .           .   .            .       .  .   . 
        Ym            xn       wn1 wn2 . . . wnm             bm   achievable. As a result, model training times have been re-
                                                                  duced from weeks to hours.
     •   Y: output values in the 1st hidden layer, for m neuron.     III.   L ITERATURE R EVIEW: ML IN VLSI D ESIGN
     •   X: n input features.                                                            AND CAD EDA
     •   W: weight, Wij with i input and j neuron.                   The field of Computer-Aided Design (CAD) is rapidly
     •   b: biases vector for m neuron..                           evolving to address the increasing complexities of mod-
     •   σ is the activation function applied element-wise.        ern VLSI chips [1].       AI integration into design au-
                                                                   tomation tools represents an approach to stay at the
   F..3 Cost functionDuring the forward propagation, each          forefront of technological advancements. Extensive re-
neuron computes its weighted sum and applies non-linearity.        search has been conducted focusing on reducing design
The Cost function or Loss function refers to the error on the      runtime and improving QoR through AI/ML integration
output layer obtained by comparing the predicted versus tar-       [13][14][15][16][14][17][16][18]. A similar review has been
get values. The cost function estimates the error for the en-      provided in [13], offering an extensive examination of the
tire network, then propagate it backward through through the       AI/ML methodologies suggested in existing literature. This
Journal of Integrated Circuits and Systems, vol. 19, n. 2, 2024                                                              5
review primarily encompasses all stages of VLSI abstraction,      implemented a Simulated Annealing (SA) probabilistic al-
including architectural considerations, physical design, cir-     gorithm. The author introduces a faster SA method based on
cuit simulation, manufacturing, and VLSI testing. In this         decision trees (DT), resulting in a similar performance to a
section, we provide an overview of the state-of-the-art and       standard SA but with a gain up to 43% on average runtime
survey recent research and draw some key limitations and          improvement. Lastly, [24] delves into the challenges posed
highlight possible enhancements.                                  by DSE on systems with multicore processors. The author
                                                                  leverages reinforcement techniques such as Imitation Learn-
  A.   ML in VLSI functional Verification                         ing (IL) to enhance the computational efficiency of these
   The study made in [64] and [43] demonstrates the impor-        manycore systems.
tance of data mining in EDA. In [19], data mining and pat-
tern extraction have exhibited the potential of the Support         C.   ML in Physical Design
Vector Machine (SVM) in reducing runtime and enhancing
simulation coverage during functional verification. As illus-        ML finds application also in the physical design stages,
trated in Fig 7, a standard set of unit tests typically attains   where data continues to grow alongside technological
maximum functional coverage upon 6,000 tests. By employ-          progress. In this section, we explore cutting-edge applica-
ing the SVM-based model, a subset of merely 310 tests ac-         tions within backend design.
complished equivalent coverage, leading to a notable 95%             C..1 ML for Floorplanning and Placement Optimization
reduction in simulation runtime. The model is capable of          Traditional Place-and-Rout (PnR) tools typically generate a
predicting coverage overlap and captures test similarities, as    floorplan without exploring multiple alternatives, regardless
depicted in Fig 8. In [20], the author proposed a classifica-     of timing, wire length, congestion, power, routability, and
tion learning method to enhance functional coverage based         other QoR metrics. In [25], the author introduces a deep-
on assertions. The model identifies how frequently an asser-      learning neural network that explores various floorplan al-
tion is triggered, highlighting the significance of each unit     ternatives, considering different aspect ratios and placement
test. The objective is to extract knowledge that can activate     styles. The model automatically generates an optimal floor-
assertions with lower coverage. To achieve this, the authors      plan for subsequent PnR stages based on dataflow and DSE
employed a feature-based analysis, utilizing supervised clas-     information. In [26], the author presents a reinforcement
sification and unsupervised association rules.                    learning agent that undergoes training across multiple chip
                                                                  blocks to produce optimized chip placements. This approach
                                                                  involves the sequential macro and standard cell placement on
                                                                  the chip canvas. The model’s rewards are based on the cost
                                                                  associated with wirelength and routing congestion.
                                                                     In [27] and [28], two deep learning-based models are in-
                                                                  troduced to enhance design testability (DFT). The model
                                                                  uses a Graph Convolutional Neural Network (CNN) for con-
                                                                  trol and observation (CP-OP) point insertion. The graph
                                                                  CNN aims to minimize the number of CP-OPs while max-
                                                                  imizing fault coverage. In [29] employs a DNN framework
            Fig. 7 Runtime saving using SVM-based mode            to accelerate cell placement. The results significantly im-
                                                                  proved the QoR and routing congestion. While the model
                                                                  still achieves good placement quality comparable to state-of-
                                                                  the-art placers. The model has accelerated the global place-
                                                                  ment runtime by a factor of 30.
                                                                     Typically, an effective placement aims to minimize the
                                                                  half-perimeter wirelength (HPWL). Nevertheless, handling
                                                                  datapaths can yield varying QoR, thus offering different
                                                                  placements. In both [30] and [31], the authors proposed
                                                                  a model that combines SVM and ANN. The models clas-
         Fig. 8 Increasing Coverage using SVM-based mode
                                                                  sify datapaths by their order of importance and guide a wire
                                                                  length-driven placement strategy, focusing on the highly
  B.   ML at High Level Synthesis                                 weighted datapaths. The results led to a reduction of 7%
                                                                  in HPWL and 12% in Steiner Wire Length (StWL).
   ML application on High-Level Synthesis (HLS) has raised
the challenge of design space exploration (DSE). In [21], the        C..2 ML for Clock Network OptimizationIn synchronous
author presents a learning-based model for DSE that speeds        circuits, the primary challenges concern the clock network
up the convergence toward the optimal RTL design archi-           as it is one of the most critical networks. Achieving a zero-
tecture. The results from Random Forest and randomized            skew clock network has always been a challenge. A com-
selection algorithms yielded the highest accuracy for opti-       mon strategy to optimize clock skew, minimize clock-tree
mal Pareto for RTL architecture. Similarly, [22] harnesses        length, and mitigate clock network power consumption in-
the power of RF and Extra-Tree to guide DSE in discovering        volves placing latches in proximity to local clock buffers, a
Pareto-optimal combinations of area and performance. [23]         technique discussed in [32] and [30]. The authors introduce
6                                                              ATTAOUI et al.: Machine Learning in VLSI Design: A Comprehensive Review.
LR:Linear Regression, LogR:Logistic Regression, PR:Polynomila Regression, DT:Decision Tree, RF:Random Forest, GB:Gradient
Boosting, XGBoost:eXtreme Gradient Boosting, SVM:SupportVectorMachine ,SVR:SupportVectorRegressor, MDP:Markov Decision
Processes, ANN:Artificial Neural Networks, CNN:Convolutional Neural Network, MARS:Multivariate Adaptive Regression Splines,
GPR:Gaussian process regression, GNN:Graph Neural Network
Journal of Integrated Circuits and Systems, vol. 19, n. 2, 2024                                                                   7
a DT model to reduce latch redundancies and propose an op-        ing detailed-route DRV. However, in advanced sub-microns,
timized latch placement solution. The approach significantly      congestion map-based placement may leave significant de-
reduces clock skew and has a positive impact on placement,        sign rule checks (DRC) violations to be addressed manu-
indirectly benefiting power consumption.                          ally or through iterating, making it a less reliable predictor
                                                                  and potentially misleading the global router. Fig 9 illustrates
   C..3 ML for CongestionRouting congestion is a criti-
                                                                  the mismatch between actual DRC violations and congestion
cal factor that significantly affects the timing behavior and
                                                                  map hotspots.
routeability. However, congestion is not always accurately
predicted from early placement stages, which misleads the            In [36], the authors employ multiple learning models, in-
router and results in longer wires and routing detours. Tools     cluding Linear Regression, Logistic Regression, and SVM
can restructure the logic and adjust functionality to mitigate    classifiers, to reduce DRC violations during detailed rout-
routing congestion hotspots. In [33], the author introduces       ing. Binary classifiers categorize globally routed cells based
a deep learning approach based on CNN to predict routing          on whether they contribute to a DRC violation. The models
congestion hotspots on a pre-placed netlist. The model uti-       were trained using features such as fan-in, fan-out, connec-
lizes various features, including the netlist graph, cell type,   tivity parameters, pin proximity, local pin density, and lo-
function, pin count, geometry, and other cell characteristics     cal overflow. The SVM model has successfully reduced the
for training. The ground truth during training was the routed     DRC violations by an average of around 20%, with some
congestion map. The model employs Graph Attention Net-            cases achieving up to a remarkable 76%, all without impact-
works (GAT) [65] and identifies common patterns in gate-          ing design timing. Fig 10 illustrates a DRC hotspot detection
level netlists, helping to pinpoint the logic elements con-       using the SVM model.
tributing to congestion. The model achieves 75% accuracy
in predicting congestion at lower metal layers, compared to
the baseline congestion map’s accuracy of 29%.
   C..4 ML for Routing OptimizationAdvanced technology
nodes have raised new challenges in routeability. Numerous
factors, such as placement quality, timing constraints, and
aspect ratio, significantly influence the design routeability.
                                                                          Fig. 9 Actual DRC vs. Congestion map DRC violations
A bad routeability results in excessive runtime, stretching
to weeks for large designs, and sometimes it ends up un-
routeable. Although the congestion map can aid in predict-
ing routeability, however, it may still prove insufficiency or
mislead the router. New research has focused on ML to pre-
dict placement solution routeability without fully performing
global or detailed routing.
   In [34], the author developed SVM-based and Multivari-
ate Adaptive Regression Splines (MARS) models to predict
routeability from the placement stage. The models were                   Fig. 10 Actual DRC vs. model’s DRC hotspot predictions
trained using designs at 28nm and 45nm technology nodes
to predict the Pareto frontiers of utilization. After dividing       In a similar context, a study conducted in [37] predicts de-
the layout into grids, various grid division features were ex-    tailed routing violations from an early placement stage by es-
tracted, including pin density per grid area, and pin proxim-     timating congested regions based on StWL estimations and
ity, cell count, net count, and edges count. The classification   pin density, thus avoiding the need for a global router. The
model achieved a prediction accuracy of 85.9%, and 90.4%          authors developed a binary classification model where the
for 45nm, and 28nm, respectively. Thus, surpassing the stan-      output indicates the presence or absence of violations. The
dard prediction based on the congestion map, which barely         model was trained on features extracted from the placement
achieved 61.7% and 73.5%.                                         stage, targeting detailed routing shorts of already routed de-
   In [35], the authors focus on predicting wirelength based      signs. The implemented neural network model consists of
on the circuit’s power distribution network (PDN). An op-         20 nodes in one hidden layer. It has achieved an average
timized PDN network reduces wirelength, while an unopti-          shorts prediction accuracy of 90%, within a reduced runtime
mized PDN can lead to inefficient placement of power rails        compared to the standard congestion map method.
and vias, resulting in suboptimal wire routing. Inefficient          Similar efforts to estimate routability and routing conges-
routing increases wirelength as signals take longer paths to      tion from an early placement stage using supervised learning
avoid congested areas and routing obstacles caused by ineffi-     have been conducted in [38] and [39] with the use of mul-
cient power networks. To mitigate this, the authors employ a      tivariate adaptive regression splines models. The learning
Gaussian process regression (GPR) model, which considers          framework aims to detect routing violations directly from the
relevant PDN attributes and placement features to reduce the      placement stage without relying on a global router, resulting
total wirelength.                                                 in reliably accurate results and shorter runtime.
   Congestion maps identify potential Design Rule violations         Routability may also be impacted from the macro place-
(DRV) at the routing stage. These congestion maps aid in          ment stage, particularly in large complex designs with high
optimizing placement by adjusting cell positions and reduc-       macro and IP counts that occupy significant chip areas. In
8                                                         ATTAOUI et al.: Machine Learning in VLSI Design: A Comprehensive Review.
[40], the authors propose a routability-driven macro place-
ment prediction using CNN to find the optimal macro-
placement with minimal DRC violations. The model fore-
casts design routeability for optimal macro-placement by ex-
ploring different configurations and evaluating wirelength,
power, and timing constraints. The CNN model is trained
using extracted features such as macro density map, pin den-
sity map, and connectivity density map. The CNN model has
reduced the DRV count and lowered the average total wire-
length, then it was integrated into the original macro place-
ment engine. Simulated annealing optimization was then ap-
plied to assess whether the resulting macro placement was                         Fig. 12 Model’s predictions of Delay on SI mode
near-optimal.
   Signal integrity (SI) may also impact delays as it influ-             C..5 ML for Physical VerificationMachine Learning is
ences the propagation of signals and overall timing perfor-           also employed in detecting layout hotspots, a topic discussed
mance. SI effects create coupling capacitance due to the              in depth in both [43] and [44]. Conventionally, hotspots
switching activity in neighboring nets which alters wire de-          are detected using lithography commercial simulation tools.
lays and transition time (slew) in adjacent nets. Most EDA            However, ML have remarkably enhanced the hotspot detec-
tools include a Static Timing Analysis (STA) engine with an           tion accuracy while preserving short runtime. A lithogra-
SI mode, which introduces additional pessimism to the total           phy simulator determines the good and bad layout samples.
delay based on aggressor and victim dependencies. How-                These samples are used to train an SVM binary classifier to
ever, timing analysis with SI mode enabled can be time-               highlight the boundary overlaps, as in Fig 13. Fig14 com-
consuming, especially for large designs.                              pares the model-predicted hotspots and the original lithogra-
   In [41], the authors developed a model to predict transition       phy simulation.
time, incremental delays, and path delays in SI mode. Fig 11
illustrates the incremental delay divergence in SI mode be-
tween a commercial tool and a signoff SI tool, with an in-
accuracy reaching 60ps. The training parameters include di-
verse design features, such as clock period, toggle rate, cou-
pling capacitance, resistance, aggressors count, differences
                                                                      Fig. 13: ML flow for Layout hotspots detection through lithography simu-
in max-min arrival times, transition time, and incremental            lation
delay in non-SI mode. The authors trained ANN and SVM
models using a 28nm technology library and combined the
predictions to obtain final values for incremental transition
time, incremental delay, and path delay in SI mode. The pre-
diction accuracy reduced the absolute error by 15.7%. Fig 12
shows the actual versus predicted incremental delays consid-
ering SI, with a worst-case absolute error of 5.2ps.
                                                                      Fig. 14: Layout hotspots detection using lithography simulator vs. model
                                                                      prediction
            X          X                     X           X
     αc ∗       ci =       c′i   ;    αn ∗        ni =       n′i      (6)
                                                                             Fig. 18: Slack discrepancies between signoff tools T1 and T2, and commer-
      • αc , αn : Cell/Net correlation factors on critical path.             cial design implementation tool D1 vs. T1
      • ci , ni : Post-silicon (PDT) measured cell/net delays.
      • c′i , n′i : Pre-silicon (STA) estimated cell/net delays.
                                                                             Fig. 19: Slack discrepancies at pre-pouting stage for a 2ns Clock Period
                                                                             Design
                                                                                [21] H.-Y. Liu and L. P. Carloni, “On learning-based methods for design-
 [2] A. B. Kahng, “Machine learning for cad/eda: the road ahead,” IEEE
                                                                                     space exploration with high-level synthesis,” in Proceedings of the
     Design & Test, vol. 40, no. 1, pp. 8–16, 2022.
                                                                                     50th annual design automation conference, 2013, pp. 1–7.
 [3] A. C. Müller and S. Guido, Introduction to machine learning with          [22] B. Ozisikyilmaz, G. Memik, and A. Choudhary, “Efficient system de-
     Python: a guide for data scientists. ” O’Reilly Media, Inc.”, 2016.             sign space exploration using machine learning techniques,” in Pro-
                                                                                     ceedings of the 45th annual design automation conference, 2008, pp.
 [4] G. Dong and H. Liu, Feature engineering for machine learning and                966–969.
     data analytics. CRC press, 2018.
                                                                                [23] A. Mahapatra and B. C. Schafer, “Machine-learning based simulated
 [5] A. Zheng and A. Casari, Feature engineering for machine learning:               annealer method for high level synthesis design space exploration,” in
     principles and techniques for data scientists. ” O’Reilly Media, Inc.”,         Proceedings of the 2014 Electronic System Level Synthesis Conference
     2018.                                                                           (ESLsyn). IEEE, 2014, pp. 1–6.
 [6] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,            [24] R. G. Kim, J. R. Doppa, and P. P. Pande, “Machine learning for design
     O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg et al.,            space exploration and optimization of manycore systems,” in 2018
     “Scikit-learn: Machine learning in python,” the Journal of machine              IEEE/ACM International Conference on Computer-Aided Design (IC-
     Learning research, vol. 12, pp. 2825–2830, 2011.                                CAD). IEEE, 2018, pp. 1–6.
 [7] L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa, A. Mueller,              [25] T.-C. Chen, P.-Y. Lee, and T.-C. Chen, “Automatic floorplanning for ai
     O. Grisel, V. Niculae, P. Prettenhofer, A. Gramfort, J. Grobler, R. Lay-        socs,” in 2020 International Symposium on VLSI Design, Automation
     ton, J. VanderPlas, A. Joly, B. Holt, and G. Varoquaux, “API de-                and Test (VLSI-DAT). IEEE, 2020, pp. 1–2.
     sign for machine learning software: experiences from the scikit-learn
                                                                                [26] A. Mirhoseini, A. Goldie, M. Yazgan, J. Jiang, E. Songhori, S. Wang,
     project,” in ECML PKDD Workshop: Languages for Data Mining and
                                                                                     Y.-J. Lee, E. Johnson, O. Pathak, S. Bae et al., “Chip placement
     Machine Learning, 2013, pp. 108–122.
                                                                                     with deep reinforcement learning,” arXiv preprint arXiv:2004.10746,
                                                                                     2020.
 [8] T. Hastie, R. Tibshirani, J. H. Friedman, and J. H. Friedman, The el-
     ements of statistical learning: data mining, inference, and prediction.    [27] C.-K. Lee, “Deep learning creativity in eda,” in 2020 International
     Springer, 2009, vol. 2.                                                         Symposium on VLSI Design, Automation and Test (VLSI-DAT). IEEE,
                                                                                     2020, pp. 1–1.
 [9] A. Géron, Hands-on machine learning with Scikit-Learn, Keras, and
     TensorFlow. ” O’Reilly Media, Inc.”, 2022.                                 [28] Y. Ma, H. Ren, B. Khailany, H. Sikka, L. Luo, K. Natarajan, and B. Yu,
                                                                                     “High performance graph convolutional networks with applications in
[10] S. Rogers and M. Girolami, A first course in machine learning. Chap-            testability analysis,” in Proceedings of the 56th Annual Design Au-
     man and Hall/CRC, 2016.                                                         tomation Conference 2019, 2019, pp. 1–6.
[11] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduc-         [29] Y. Lin, S. Dhar, W. Li, H. Ren, B. Khailany, and D. Z. Pan, “Dream-
     tion. MIT press, 2018.                                                          place: Deep learning toolkit-enabled gpu acceleration for modern vlsi
                                                                                     placement,” in Proceedings of the 56th Annual Design Automation
[12] A. M. Wichert and L. Sa-Couto, Machine Learning-A Journey to Deep               Conference 2019, 2019, pp. 1–6.
     Learning: With Exercises and Answers. World Scientific, 2021.
                                                                                [30] B. Yu, D. Z. Pan, T. Matsunawa, and X. Zeng, “Machine learning
[13] D. Amuru, A. Zahra, H. V. Vudumula, P. K. Cherupally, S. R. Gurram,             and pattern matching in physical design,” in The 20th Asia and South
     A. Ahmad, and Z. Abbas, “Ai/ml algorithms and applications in vlsi              Pacific design automation conference. IEEE, 2015, pp. 286–293.
     design and technology,” Integration, 2023.
                                                                                [31] S. Ward, D. Ding, and D. Z. Pan, “Pade: A high-performance placer
                                                                                     with automatic datapath extraction and evaluation through high di-
[14] S. Saini, K. Lata, and G. Sinha, VLSI and Hardware Implementations
                                                                                     mensional data learning,” in Proceedings of the 49th Annual Design
     Using Modern Machine Learning Methods. CRC Press, 2021.
                                                                                     Automation Conference, 2012, pp. 756–761.
[15] P. A. Beerel and M. Pedram, “Opportunities for machine learning in         [32] S. I. Ward, N. Viswanathan, N. Y. Zhou, C. C. Sze, Z. Li, C. J. Alpert,
     electronic design automation,” in 2018 IEEE International Symposium             and D. Z. Pan, “Clock power minimization using structured latch tem-
     on Circuits and Systems (ISCAS). IEEE, 2018, pp. 1–5.                           plates and decision tree induction,” in 2013 IEEE/ACM International
                                                                                     Conference on Computer-Aided Design (ICCAD). IEEE, 2013, pp.
[16] A. Malhotra and A. Singh, “Implementation of ai in the field of vlsi: A
                                                                                     599–606.
     review,” in 2022 Second International Conference on Power, Control
     and Computing Technologies (ICPC2T). IEEE, 2022, pp. 1–5.                  [33] R. Kirby, S. Godil, R. Roy, and B. Catanzaro, “Congestionnet: Rout-
                                                                                     ing congestion prediction using deep graph neural networks,” in 2019
[17] L. Wang and M. Luo, “Machine learning applications and opportu-                 IFIP/IEEE 27th International Conference on Very Large Scale Inte-
     nities in ic design flow,” in 2019 international symposium on VLSI              gration (VLSI-SoC). IEEE, 2019, pp. 217–222.
     design, automation and test (VLSI-DAT). IEEE, 2019, pp. 1–3.
                                                                                [34] W.-T. J. Chan, Y. Du, A. B. Kahng, S. Nath, and K. Samadi, “Beol
[18] A. B. Kahng, “Machine learning applications in physical design: Re-             stack-aware routability prediction from placement using data mining
     cent results and directions,” in Proceedings of the 2018 international          techniques,” in 2016 IEEE 34th international conference on computer
     symposium on physical design, 2018, pp. 68–73.                                  design (ICCD). IEEE, 2016, pp. 41–48.
Journal of Integrated Circuits and Systems, vol. 19, n. 2, 2024                                                                                             13
[35] W.-H. Chang, L.-D. Chen, C.-H. Lin, S.-P. Mu, M. C.-T. Chao, C.-            [49] H. Dhotre, S. Eggersglüß, and R. Drechsler, “Identification of efficient
     H. Tsai, and Y.-C. Chiu, “Generating routing-driven power distribu-              clustering techniques for test power activity on the layout,” in 2017
     tion networks with machine-learning technique,” in Proceedings of                IEEE 26th Asian Test Symposium (ATS). IEEE, 2017, pp. 108–113.
     the 2016 on International Symposium on Physical Design, 2016, pp.
     145–152.                                                                    [50] F. Ye, F. Firouzi, Y. Yang, K. Chakrabarty, and M. B. Tahoori,
                                                                                      “On-chip voltage-droop prediction using support-vector machines,” in
[36] W.-T. J. Chan, P.-H. Ho, A. B. Kahng, and P. Saxena, “Routability                2014 IEEE 32nd VLSI Test Symposium (VTS). IEEE, 2014, pp. 1–6.
     optimization for industrial designs at sub-14nm process nodes using
     machine learning,” in Proceedings of the 2017 ACM on International          [51] S. Dey, S. Nandi, and G. Trivedi, “Powerplanningdl: Reliability-aware
     Symposium on Physical Design, 2017, pp. 15–21.                                   framework for on-chip power grid design using deep learning,” in
                                                                                      2020 Design, Automation & Test in Europe Conference & Exhibition
[37] A. F. Tabrizi, N. K. Darav, S. Xu, L. Rakai, I. Bustany, A. Kennings,            (DATE). IEEE, 2020, pp. 1520–1525.
     and L. Behjat, “A machine learning framework to identify detailed
     routing short violations from a placed netlist,” in Proceedings of the      [52] ——, “Machine learning for vlsi cad: A case study in on-chip power
     55th Annual Design Automation Conference, 2018, pp. 1–6.                         grid design,” in 2021 IEEE Computer Society Annual Symposium on
                                                                                      VLSI (ISVLSI). IEEE, 2021, pp. 378–383.
[38] Z. Qi, Y. Cai, and Q. Zhou, “Accurate prediction of detailed routing
     congestion using supervised data learning,” in 2014 IEEE 32nd inter-        [53] O. S. Ram and S. Saurabh, “Modeling multiple-input switching
     national conference on computer design (ICCD). IEEE, 2014, pp.                   in timing analysis using machine learning,” IEEE Transactions on
     97–103.                                                                          Computer-Aided Design of Integrated Circuits and Systems, vol. 40,
                                                                                      no. 4, pp. 723–734, 2020.
[39] Q. Zhou, X. Wang, Z. Qi, Z. Chen, Q. Zhou, and Y. Cai, “An accurate
     detailed routing routability prediction model in placement,” in 2015        [54] L.-C. Wang, P. Bastani, and M. S. Abadir, “Design-silicon timing cor-
     6th Asia Symposium on Quality Electronic Design (ASQED). IEEE,                   relation: A data mining perspective,” in Proceedings of the 44th an-
     2015, pp. 119–122.                                                               nual Design Automation Conference, 2007, pp. 384–389.