Gan Cts
Gan Cts
   Abstract—In this paper, we propose a complete framework             •  Generality of Model: No previous work conducts the
named GAN-CTS which utilizes conditional generative adver-                prediction experiments on the unseen netlists. They utilize
sarial network (GAN) and reinforcement learning to predict                the same netlists during training and testing stages.
and optimize clock tree synthesis (CTS) outcomes. To precisely
                                                                        • Interpretability of Parameters: No previous work inter-
characterize different netlists, we leverage transfer learning to
extract design features directly from placement images. Based             prets the importance of each CTS input parameter subject
on the proposed framework, we further quantitatively interpret            to distinct CTS outcomes.
the importance of each CTS input parameter subject to var-              • Optimization of Metrics: No previous work demon-
ious design objectives. Finally, to prove the generality of our           strates the capability to optimize the CTS outcomes with
framework, we conduct experiments on the unseen netlists which
are not utilized in the training process. Experimental results            respect to various design objectives.
performed on industrial designs demonstrate that our framework          • Suggestion of Inputs: Given a design, no previous work
(1) achieves an average prediction error of 3%, (2) improves the          exhibits the ability to suggest the CTS input parameters
commercial tool’s auto-generated clock tree by 51.5% in clock             sets that achieve optimized clock trees. They merely
power, 18.5% in clock wirelength, 5.3% in the maximum skew,               adopt discriminative approaches to model the CTS stage.
and (3) reaches an F1-score of 0.952 in the classification task of
determining successful and failed CTS processes.                        These aspects which previous works leave under-addressed
                                                                     are the most crucial factors that determine whether the pro-
                      I. I NTRODUCTION                               posed modeling methods are practical or not. In this paper,
                                                                     we solve all the issues presented above. Furthermore, a unique
   Clock tree synthesis (CTS) is a critical stage of physical        aspect of our work is that we directly extract design features
design, since clock network often constitutes a high percentage      from placement layout images to perform practical CTS out-
of the overall power in the final full-chip design. An optimized     comes predictions in ultra-high precision. We demonstrate that
clock tree helps to avoid serious design issues such as exces-       the features extracted from the layouts are indeed highly useful
sive power consumption, high routing congestion, elongated           to improve the accuracy of our models.
timing closure, etc [3]. However, due to the high complexity            The goal of this work is to construct a general and practical
and run time of electronic design automation (EDA) tools,            CTS framework which owns the capability to predict CTS
designers are struggling to synthesize high quality clock trees      outcomes in high precision as well as the facility to recom-
that optimize key desired metrics such as clock power, skew,         mend designers the input parameters sets leading to optimized
clock wirelength, etc. To find the input parameters that achieve     clock trees even for unseen netlists. Our proposed framework
desired CTS outcomes, designers have to search in a wide             consists of a regression model and a variation of generative
range of candidate parameters, which is usually fulfilled in a       adversarial network (GAN) [5] named conditional GAN [14].
manual and highly time-consuming calibration fashion.                   Apart from the conventional GAN training process, in this
   To automate this task and alleviate the burden for designers,     work, a reinforcement learning is introduced to the generator,
several machine learning (ML) techniques have been proposed          and a multi-task learning is presented to the discriminator.
to predict the clock network metrics. Previous work [10]             The reinforcement learning is conducted by the regression
utilizes data mining tools to estimate the achieved skew and         model which is trained prior to the conditional GAN. It
insertion delay. The authors of [8] employ statistical learning      serves as a supervisor to guide the generator to produce the
and meta-modeling methods to predict more essential metrics          CTS input parameters sets that lead to optimized clock trees.
such as clock power and clock wirelength. The authors of [9]         As for the discriminator, it not only predicts whether the
consider the effect of non-uniform sinks and different aspect        input is generated or real as the conventional approach, but
ratios. A recent work [12] utilizes artificial neural networks       also predicts if the input results in a successful CTS run or
(ANNs) to predict the transient clock power consumption              not. In this paper, a successful CTS run indicates that the
based on the estimation of clock tree components. However, all       achieved clock tree outperforms the one auto-generated by the
the previous works only focus on enhancing the performance           commercial tool.
of prediction. There are four highly crucial aspects that all of        The rest of the paper is organized as follows. Section II
them neglect, namely:                                                presents a solid overview of our framework. Section III
              Feature Extraction
Flip Flop                                generated                 real         the achieved maximum skew. As mentioned in Section I, we
                                         CTS inputs              CTS inputs     do not consider our model as a black box as previous works.
                                                                                To further interpret the predictions made by the regression
                                                                                model, we leverage a gradient-based attribution method [1]
Clock Net                          Regression                                   to quantitatively interpret the importance of each CTS input
                                                       Discriminator
                                     model                                      parameter subject to different target outcomes.
                                         1                              3          The last training stage of the framework involves a rein-
                                   CTS outcome    fake/real      success/fail   forcement learning and a generative adversarial learning. We
 Routing                            prediction    decision        decision      leverage a conditional GAN to perform the CTS optimization
Fig. 1: A high-level view of this work and the three objectives we              and classification tasks. The regression model trained prior at
have achieved. The first objective is to predict the CTS outcomes in            this stage now acts as a supervisor to guide the generator to
high precision. The second objective is to recommend designers good             generate CTS input parameters sets which lead to optimized
CTS input settings. The third objective is to determine whether the             clock trees. The guidance is conducted by a reinforcement
input settings outperform commercial tool’s auto-setting.                       learning technique named policy gradient [16].
                                                                                   In conditional GAN, we take the extracted placement fea-
                                                                                tures as the conditional input, where the original inputs of
describes the problem formulations, database generation, and
                                                                                the vanilla GAN model are considered as regular inputs. The
identifies the difficulties of the problems. Section IV illustrates
                                                                                great advantage of having the conditional input is that it
the algorithms and the detailed structures of the models.
                                                                                allows the model to control the modes of the generated data.
Experimental results are demonstrated in Section V, and finally
                                                                                In this work, we consider different benchmarks as different
we conclude our work in Section VI.
                                                                                modes. Therefore, with the aid of the conditional approach,
                                                                                our framework has the facility to optimize unseen benchmarks
                                      II. OVERVIEW
                                                                                which are not utilized during the training process.
   It has been widely acknowledged that GAN is a promising                         Finally, a highlight of our framework is that a multi-task
model that learns the complicated real-world data through a                     learning is conducted in the discriminator. In addition to the
generative approach [17]. A vanilla GAN structure contains a                    conventional task of distinguishing between generated and real
generator and a discriminator which are both neural networks.                   samples, we introduce a new task of classifying successful
The goal of the generator is to generate meaningful data from                   and failed CTS runs. In this paper, we strictly define a CTS
a given distribution (e.g. random noise), while the objective                   run as successful if two out of the three achieved target CTS
of the discriminator is to distinguish the generated samples                    metrics mentioned earlier are better than the ones achieved
from the real samples in the database. Upon the vanilla                         automatically by the commercial tool.
GAN structure, in conditional GAN, an external conditional                         In the end of the training process, we acquire four models
input is further introduced to both generator and discriminator.                as follows:
This conditional input enables the model to direct the data                        • A placement feature extractor E which precisely charac-
generation process, which benefits us to generate suitable CTS                        terizes different designs from placement images.
input parameters sets with respect to different benchmarks and                     • A regression model R which performs high precision
even the ones which are not utilized in the training process.                         predictions of target CTS outcomes.
   A high-level view of our framework named GAN-CTS is                             • A generator G which generates CTS input parameters
shown in Figure 1. The framework is comprised of three                                sets that lead to optimized clock trees.
training stages. The first training stage is to extract key design                 • A discriminator D which predicts the success and failure
features from placement images which represent flip flop                              of CTS runs.
distribution, clock net distribution, and trial routing result.
Note that trial routing is a process performed in the end                                     III. D ESIGNING E XPERIMENTS
of the placement stage, which provides an estimation of the                       We formally define the clock tree synthesis (CTS) prediction
routing congestion in a given placement. To extract features                    and optimization problems as follows:
from images, we introduce transfer learning to a pre-trained                    Problem 1 (CTS Outcomes Prediction). Given a pre-CTS
convolutional neural network (CNN) named ResNet-50 [6],                         placement P and a CTS input parameters set X, predict the
which is a 50-layer residual network.                                           post-CTS outcomes Y without performing any actual CTS.
   Following from the feature extraction process, we utilize the
extracted features as well as the clock trees in the database to                Problem 2 (CTS Outcomes Optimization). Given a pre-
train the regression model and to predict CTS outcomes. The                     CTS placement P , generate a CTS input parameters set X̂
regression model is a multi-output deep neural network which                    that leads to optimized CTS outcomes Ŷ .
predicts multiple CTS metrics simultaneously. In this work,
          TABLE I: Our benchmarks and their attributes.
        Netlist Name      # Nets    # Flip Flops    # Total Cells
         AES-128          90,905          10,688         113,168
       Arm-Cortex-M0      13,267           1,334          12,942
           NOVA          138,171          29,122         136,537
            ECG           85,058          14,018          84,127
           JPEG          231,934          37,642         219,064                                  NOVA
           LDPC           42,018           2,048          39,377
           TATE          185,379          31,409         184,601
                                                                         embedding dimension 2
                                                                                                                                                   NOVA
Fig. 3: Visualization of trial routing image in convolutional filters.                           embedding dimension 1         embedding dimension 1
                                                                         Fig. 5: t-SNE visualizations of the original placement image vectors
                                                  Auxiliary Input        and the extracted feature vectors.
                                                   (# Flip Flops)
           Concatenate Layer
                                                                                                          extracted               CTS
                                                FC 1(512 neurons)                                         features             parameters
                                                                                                                Shared FC layer 3
                                                  FC 4 (1 neuron)
Fig. 4: Our image feature extraction flow. The extracted features are             power FC layer 2                WL FC layer 2         skew FC layer 2
colored in red.
                                                                                  power FC layer 3                WL FC layer 3         skew FC layer 3
from placement images to solve CTS outcomes prediction                                power prediction            WL prediction         skew prediction
and optimization problems. Our approach is built upon con-                                           Fig. 6: Our multi-output regression model.
volutional neural network (CNN) and transfer learning, where
we devise our own fully connected (FC) layers upon the pre-
trained model named ResNet-50 [6], which is a CNN-based                  output vector of the first FC layer with 512 dimensions denoted
model pre-trained on the well-known ImageNet [4] dataset                 in red in the figure as the final extracted feature vector.
with millions of images.                                                    In the implementation, all the images are in a dimension
   Figure 3 shows the visualization of a trial routing image             of 700 x 717 x 3. The achieved mean absolute percentage
passing through four convolutional filters inside the pre-trained        error of the unseen validation netlists to predict the estimated
ResNet-50 model. In the figure, we observe that important                power is only 1%. However, since a low prediction error does
information such as usage of metal layers is well captured.              not guarantee a good feature representation, we leverage a
Despite that ResNet-50 is powerful on many image datasets,               dimension reduction technique named t-distributed stochastic
it is not devised specifically for physical design problems.             neighbor embedding (t-SNE) [13] to visualize the extracted
Therefore, to extract key design features from the high di-              features (∈ R512 ) in R2 as shown in Figure 5. In the figure,
mensional vectors, we devise a feature extraction flow with              we observe that different placements belonging to the same
transfer learning as shown in Figure 4. Four self-designed FC            netlist are clustered together and those belonging to different
layers are appended on top of the ResNet-50 model to predict             netlists are well separated. Therefore, we conclude that the
the commercial tool’s estimation of total power right after              extracted features well capture the characteristics of a design.
the placement stage. As shown in the figure, since placement
images cannot precisely reflect the actual size of a design,             B. CTS Outcomes Prediction
we introduce an auxiliary input with one dimension to our                  Constructing a precise regression model is the key step to
FC layers which represents the total number of flip flops. In            solve Problem 1. Following the feature extraction process,
the training process, we fix the parameters of the pre-trained           we train the regression model with the extracted features to
ResNet-50 model and only update the parameters of the FC                 predict the target CTS outcomes. As mentioned in Section II,
layers. After training, for each pre-CTS placement, we take the          in this paper, we target at predicting and optimizing three
target CTS outcomes: clock power, clock wirelength, and the            extracted        random
maximum skew. The regression model is a multi-output deep              features          noise           input layer (64 neurons)
neural network as shown in Figure 6. The model takes the
extracted features along with the CTS parameters as inputs and                                                 leaky ReLU
outputs three predictions simultaneously. Note that a multi-           input
                                                                        input layer
                                                                               layer (256
                                                                                     (256 neurons)
                                                                                          neurons)
task learning is introduced in here. The reason we leverage                                                 batch normalization
a single multi-output model rather than multiple single-output                leaky ReLU
models as previous work [12] is that different CTS outcomes                                              output layer (D neurons)
                                                                          batch normalization
are not independent of each other. Therefore, shared layers
enable different objectives to own mutual information, which                                                         tanh
                                                                      hidden layer (128 neurons)
reduces the training time as well as enhances the performance
of each prediction. In the training process, we utilize mean                  leaky ReLU
squared error as the loss function, and leverage dropout layers                                         regression     discriminator
inside the model to prevent it from overfitting. The validation           batch normalization             model
results of this experiment are shown in Section V.
   In this work, we do not treat the regression model as a black             Fig. 7: Detailed structure of our GAN generator.
box as previous works. Instead, we leverage a gradient-based
attribution algorithm named DeepLIFT [15] to interpret the
predictions. The algorithm aims to determine the attribution        of CTS modeling parameters, and a hyperbolic tangent layer
(relevance) value of each input neuron subject to different         is chosen as the activation function to match the domain of
outputs. Assume our regression model R takes an input vector        the normalized samples drawn from the database. Since when
x = [x1 , ..., xN ] ∈ RN and produces an output vector              training the discriminator, we normalize the real samples x
S = [S1 , ..., Sk ]. DeepLIFT proceeds ali , the attribution of     from the database to x̂ ∈ [−1, 1] as
neuron i at layer l, in a backward fashion by calculating the                                   x
activation difference of the neuron between the target input x                     x̂ =                   × 2 − 1.           (3)
                                                                                        maxx∈supp(x) (x)
and reference input x̂. The procedure is derived as
        (                                                              In GAN-CTS, the generator has two important objectives.
  L       Si (x) − Si (x̂) if neuron i is the output of interest
 ai =                                                               The first is to generate realistic samples that deceive the
          0                otherwise,                               discriminator, where the corresponding objective function is
                                                              (1)
                 l
                     X        zji − ẑji          l+1                                 LGD = Ez,f [log(D(G(z, f ))].                 (4)
                ai =      P         P          · aj ,         (2)
                       j    i zji −     i ẑji
                                                                      The second objective is to generate the samples that lead
where L denotes the output layer, and zji is the weighted           to optimized clock trees by maximizing reward r from the
activation of neuron i onto neuron j in the next layer. In the      pre-trained regression model R. The reward r is defined as
implementation, we take the reference input x̂ as the input                                  N
parameters of the auto-generated clock tree.
                                                                                             Y            Ri (G(z, f ))
                                                                     r := R(G(z, f )) = −                                        , (5)
                                                                                             i=1
                                                                                                 auto-setting result of target i
C. CTS Optimization
   To solve Problem 2, we introduce a reinforcement learning        where N denotes the number of target CTS outcomes and Ri
to the conditional GAN by taking the pre-trained regression         denotes the corresponding prediction of the regression model.
model as a supervisor of the generator. Before illustrating the        To optimize the reward defined in Equation 5, we leverage
objectives of the optimization process, we first describe the       a policy gradient algorithm named REINFORCE [18]. The
structure of the generator. The generator G is a neural network     algorithm considers the generator as an agent that strives to
parameterized by θg which samples a regular input z with            maximize the accumulative reward by producing wise actions
100 dimensions from a N (0, 1) Gaussian distribution pz and         G(z, f ) from a given environment created by the discriminator.
samples the extracted placement features f from the database        The corresponding objective is derived as
pd as the conditional input.
                                                                                   LGP = Ez,f [r · ln(µ(G(z, f )))],                (6)
   Figure 7 shows the detailed structure of the generator. The
leaky ReLU [20] layers are employed as activation functions         where µ is a softmax function that maps G(z, f ) to [0, 1].
of the input and hidden layers to project latent variables          With the above objective function, the generator is expected to
onto a wider domain, which eliminates the bearing of van-           generate the CTS input parameters sets G(z, f ) that maximize
ishing gradients. Batch normalization [7] layers are utilized       the rewards r, which results in optimized clock trees.
to normalize the inputs of each hidden layer to zero-mean              Finally, by combining the two objective functions illus-
and unit-variance, which accelerates the training process since     trated, we formulate the training process of the generator as
the oscillation of gradient descent is reduced. Finally, for the
output layer, the number of neurons D denotes the number               max E z∼pz [log(D(G(z, f ))) + r · ln(µ(G(z, f )))].         (7)
                                                                         G    f ∼pd
generated CTS inputs          extracted          real CTS inputs
                                                                         Algorithm 1 GAN-CTS training methodology.
     (generator)              features             (database)            We use default values of αr = 0.0001, αGAN = 0.0001,
                                                                         β1 = 0.9, β2 = 0.999, m = 128.
                                                                         Input: {f }: extracted placement features, {x}: training data,
                   input layer (128 neurons)                                 {Y }: target CTS metrics for prediction and optimization.
                                                                         Input: αr : learning rate of regression model, αGAN : learning
                          leaky ReLU
                                                                             rate of GAN, m: batch size, {θr }0 : initial parameters
                  hidden layer (64 neurons)                                  of regression model, θg0 : initial parameters of generator,
                                                                             {θd }0 : initial parameters of discriminator, {β1 , β2 }: Adam
                          leaky ReLU                                         parameters.
                                                                         Output: R: regression model, G: generator, D: discriminator.
                                                                          1: N ← length(y)
 hidden layer (64 neurons)            hidden layer (64 neurons)           2: while {θr } do not converge do
                                                                          3:     Sample a batch of training data {x(i) }m                i=1 ∼ pd
        leaky ReLU                              leaky ReLU
                                                                          4:     Take features {f (i) }m    i=1 corresponding to {x }i=1
                                                                                                                                                 (i) m
                                                                                 [4] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet:
                                                                                     A large-scale hierarchical image database. In 2009 IEEE conference on
                                                                                     computer vision and pattern recognition, 2009.
Fig. 10: Distributions of random generated vs. GAN-CTS generated                 [5] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
                                                                                     S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In
clock trees for AES benchmark.
                                                                                     Advances in neural information processing systems, 2014.
                                                                                 [6] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image
                                                                                     recognition. In Proceedings of the IEEE conference on computer vision
failure are defined by comparing the CTS metrics of the clock                        and pattern recognition, 2016.
                                                                                 [7] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network
trees generated by GAN-CTS to the one auto-generated by                              training by reducing internal covariate shift. arXiv:1502.03167, 2015.
the commercial tool. Table VI summarizes the classification                      [8] A. B. Kahng, B. Lin, and S. Nath. Enhanced metamodeling techniques
results in a confusion matrix with NOVA benchmark. The                               for high-dimensional ic design estimation problems. In Proceedings of
                                                                                     the Conference on Design, Automation and Test in Europe, 2013.
accuracy and the F1-score are 0.947 and 0.952, respectively.                     [9] A. B. Kahng, B. Lin, and S. Nath. High-dimensional metamodeling for
                                                                                     prediction of clock tree synthesis outcomes. In System Level Interconnect
              VI. C ONCLUSION AND F UTURE W ORK                                      Prediction (SLIP), ACM/IEEE International Workshop on. IEEE, 2013.
                                                                                [10] A. B. Kahng and S. Mantik. A system for automatic recording and
   In this paper, we have proposed a complete framework                              prediction of design quality metrics. In isqed. IEEE, 2001.
named GAN-CTS to solve the complicated CTS outcomes                             [11] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization.
optimization and input generation problems which all previous                        arXiv preprint arXiv:1412.6980, 2014.
                                                                                [12] Y. Kwon, J. Jung, I. Han, and Y. Shin. Transient clock power
works leave under-addressed. The fact that the proposed                              estimation of pre-cts netlist. In Circuits and Systems (ISCAS), 2018
framework demonstrates promising results on the unseen                               IEEE International Symposium on. IEEE, 2018.
benchmarks proves that it is general and practical.                             [13] L. v. d. Maaten and G. Hinton. Visualizing data using t-sne. Journal of
                                                                                     machine learning research, 9(Nov), 2008.
   In spite of the superior performance achieved, the proposed                  [14] M. Mirza and S. Osindero. Conditional generative adversarial nets. arXiv
framework is currently dependent on the technology node. In                          preprint arXiv:1411.1784, 2014.
the future, we aim to overcome this dependence.                                 [15] A. Shrikumar, P. Greenside, A. Shcherbina, and A. Kundaje. Not just a
                                                                                     black box: Learning important features through propagating activation
                     VII. ACKNOWLEDGMENT                                             differences. arXiv preprint arXiv:1605.01713, 2016.
                                                                                [16] R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour. Policy gra-
  This work is supported by the National Science Foundation                          dient methods for reinforcement learning with function approximation.
under Grant No. CNS 16-24731 and the industry members of                             In Advances in neural information processing systems, 2000.
                                                                                [17] K. Wang, C. Gou, Y. Duan, Y. Lin, X. Zheng, and F.-Y. Wang.
the Center for Advanced Electronics in Machine Learning.                             Generative adversarial networks: introduction and outlook. IEEE/CAA
                                                                                     Journal of Automatica Sinica, 4(4):588–598, 2017.
                             R EFERENCES                                        [18] R. J. Williams. Simple statistical gradient-following algorithms for
                                                                                     connectionist reinforcement learning. Machine learning, 8(3-4), 1992.
 [1] M. Ancona et al. Towards better understanding of gradient-based attri-     [19] Z. Xie, Y.-H. Huang, G.-Q. Fang, H. Ren, S.-Y. Fang, Y. Chen,
     bution methods for deep neural networks. In International Conference            and J. Hu. Routenet: routability prediction for mixed-size designs
     on Learning Representations, 2018.                                              using convolutional neural network. In 2018 IEEE/ACM International
 [2] F. Chollet et al. Keras, 2015.                                                  Conference on Computer-Aided Design (ICCAD). IEEE, 2018.
 [3] A. Datli, U. Eksi, and G. Isik. A clock tree synthesis flow tailored for   [20] B. Xu, N. Wang, T. Chen, and M. Li. Empirical evaluation of rectified
     low power. In https://www.design-reuse.com/articles/33873/clock-tree-           activations in convolutional network. arXiv:1505.00853, 2015.
     synthesis-flow-tailored-for-low-power.html, 2013.