0% found this document useful (0 votes)
3 views5 pages

Isscc Oa

The document presents the Envision processor, a convolutional neural network (CNN) processor designed for energy efficiency, achieving up to 10 TOPS/W using a subword-parallel dynamic-voltage-accuracy-frequency-scaling (DVAFS) technique in a 28nm FDSOI technology. It supports a variety of ConvNet topologies and is fully C-programmable, enabling always-on visual recognition in wearable devices while optimizing energy consumption through hierarchical recognition processing. The paper details the architecture, energy scalability, and performance metrics of the Envision processor, demonstrating significant improvements over existing ConvNet accelerators.

Uploaded by

Vu Ha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views5 pages

Isscc Oa

The document presents the Envision processor, a convolutional neural network (CNN) processor designed for energy efficiency, achieving up to 10 TOPS/W using a subword-parallel dynamic-voltage-accuracy-frequency-scaling (DVAFS) technique in a 28nm FDSOI technology. It supports a variety of ConvNet topologies and is fully C-programmable, enabling always-on visual recognition in wearable devices while optimizing energy consumption through hierarchical recognition processing. The paper details the architecture, energy scalability, and performance metrics of the Envision processor, demonstrating significant improvements over existing ConvNet accelerators.

Uploaded by

Vu Ha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/314297052

14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-


accuracy-frequency-scalable Convolutional Neural Network processor in
28nm FDSOI

Conference Paper · February 2017


DOI: 10.1109/ISSCC.2017.7870353

CITATIONS READS

482 1,968

4 authors, including:

Bert Moons Roel Uytterhoeven


KU Leuven KU Leuven
38 PUBLICATIONS 2,228 CITATIONS 9 PUBLICATIONS 578 CITATIONS

SEE PROFILE SEE PROFILE

Marian Verhelst
KU Leuven
341 PUBLICATIONS 5,711 CITATIONS

SEE PROFILE

All content following this page was uploaded by Bert Moons on 09 November 2017.

The user has requested enhancement of the downloaded file.


Citation Bert Moons, Roel Uytterhoeven, Wim Dehaene, Marian Verhelst (2017),
Envision: A 0.26-to-10TOPS/W Subword-Parallel Dynamic-Voltage-
Accuracy-Frequency-Scalable Convolutional Neural Network Processor
in 28nm FDSOI
IEEE International Solid-State Circuits Conference, 2017, pp 246-247

Archived version Author manuscript: the content is identical to the content of the published
paper, but without the final typesetting by the publisher

Published version DOI: 10.1109/ISSCC.2017.7870353

Journal homepage Isscc.org

Author contact bert.moons@esat.kuleuven.be


marian.verhelst@esat.kuleuven.be

(article begins on next page)


2017_Session_14.qxp_2017 12/9/16 12:41 PM Page 15

ISSCC 2017 / SESSION 14 / DEEP-LEARNING PROCESSORS / 14.5

14.5 ENVISION: A 0.26-to-10TOPS/W Subword-Parallel Figure 14.5.3 shows the top-level architecture of Envision. This chip is a multi-
Dynamic-Voltage-Accuracy-Frequency-Scalable power and multi-body-bias domain, sparsity-guarded ConvNet processor
exploiting DVAFS. It is fully C-programmable, allowing deployment of a wide range
Convolutional Neural Network Processor in 28nm of ConvNet topologies, and has a 16b SIMD RISC instruction set extended with
FDSOI custom instructions, similar to [4]. The processor is equipped with 2D- (for
convolutions) and 1D-SIMD arrays (for ReLU, max-pooling) and a scalar unit. An
Bert Moons, Roel Uytterhoeven, Wim Dehaene, Marian Verhelst on-chip memory (DM) consists of 64×2kB single-port SRAM macros, subdivided
into 4 blocks of 16 parallel banks, storing a maximum of 65536×N words. 3 blocks
KU Leuven, Leuven, Belgium can be read or written in parallel: 2 blocks by the processor, another by the
Huffman DMA, used for compressing IO bandwidth up to 5.8×. The system is
ConvNets, or Convolutional Neural Networks (CNN), are state-of-the-art divided into three power- and body-bias domains to enable granular dynamic
classification algorithms, achieving near-human performance in visual recognition voltage scaling.
[1]. New trends such as augmented reality demand always-on visual processing
in wearable devices. Yet, advanced ConvNets achieving high recognition rates are Figure 14.5.4 shows how the 6-stage pipelined processor executes convolutions
too expensive in terms of energy as they require substantial data movement and in its 16×16 2D-SIMD MAC array. Each MAC is a single cycle N-subword-parallel
billions of convolution computations. Today, state-of-the-art mobile GPU’s and multiplier, followed by a N×48b/N reconfigurable accumulation adder and register.
ConvNet accelerator ASICs [2][3] only demonstrate energy-efficiencies of 10’s to As such, the 16×16 array can generate N ×256 intermediate outputs per cycle
several 100’s GOPS/W, which is one order of magnitude below requirements for while consuming only N×16 filter weights and N×16 features in a first convolution
always-on applications. This paper introduces the concept of hierarchical cycle. In subsequent cycles, a 256b FIFO further reduces memory bandwidth by
recognition processing, combined with the Envision platform: an energy-scalable reusing and shifting features along the x-axis, requiring only a single new feature
ConvNet processor achieving efficiencies up to 10TOPS/W, while maintaining fetch per cycle. As all intermediate output values are stored in accumulation
recognition rate and throughput. Envision hereby enables always-on visual registers, there is no data-transfer between MACs and no frequent write-back to
recognition in wearable devices. SRAM. Sparsity is exploited by guarding both memory fetches and MAC
operations [4], using flags stored in a GRD memory. This leads to an additional
Figure 14.5.1 demonstrates the concept of hierarchical recognition. Here, a 1.6× system-wide gain in energy consumption compared to DVAFS alone for
hierarchy of increasingly complex individually trained ConvNets, with different typical ConvNets (30-60% zeroes).
topologies, different network sizes and increasing computational precision
requirements, is used in the context of person identification. This enables constant Envision was implemented in a 28nm FDSOI technology on 1.87mm2 and runs at
scanning for faces at very low average energy cost, yet rapidly scales up to more 200MHz at 1V and room temperature. Fig. 14.5.5 shows measurement results
complex networks detecting a specific face such as a device’s owner, all the way highlighting its wide-range precision-energy scalability, with nominal and optimal
up to full VGG-16-based 5760-face recognition. The opportunities afforded by body-biasing. All modes run the same 5×5 ConvNet-layer, with a typical MAC
such a hierarchical approach span far beyond face recognition alone, but can only efficiency of 73%, or 0.73×f×N×256×2 effective operations per second. When
be exploited by digital systems demonstrating wide-range energy scalability scaling down from 16- to 4×4b sparse computations at 76GOPS, power goes
across computational precision. State-of-the-art ASICs in references [3] and [4] from 290mW down to 7.6mW, as supply voltage and body bias are modulated
only show 1.5× and 8.2× energy-efficiency scalability, respectively. Envision between 0.65-1.1V and ±0.2-1.2V. Measurements for the convolutional layers in
improves upon this by introducing subword-parallel Dynamic-Voltage-Accuracy- hierarchical face recognition are listed in Fig. 14.5.1, demonstrating 6.2μJ/f at
Frequency Scaling (DVAFS), a circuit-level technique enabling 40× average 6.5mW instead of 23100μJ/f at 77mW. This illustrates the feasibility of
energy-precision scalability at constant throughput. Figure 14.5.2 illustrates the always-on recognition through hierarchical processing on energy-scalable
basic principle of DVAFS and compares it to Dynamic-Accuracy Scaling (DAS) Envision.
and Dynamic-Voltage-Accuracy Scaling (DVAS) [4]. In DAS, switching activity
and hence energy consumption is reduced for low precision computations by Figure 14.5.6 shows a comparison with recent ConvNet ASICs. Envision scales
rounding and masking a configurable number of LSB’s at the inputs of multiply- efficiency on the AlexNet convolutional layers between 0.8-3.8TOPS/W, compared
accumulate (MAC) units. DVAS exploits shorter critical paths in DAS’s to 0.16TOPS/W [3] and 0.56-1.4TOPS/W [4]. Efficiency is 2TOPS/W average for
reduced-precision modes by combining it with voltage scaling for increased VGG-16 and up to 10 TOPS/W peak. This further illustrates Envision’s ability to
energy scalability. This paper proposes subword-parallel DVAFS, which further minimize energy-consumption for any ConvNet, demonstrating an energy-
improves upon DVAS, by reusing inactive arithmetic cells at reduced precision. scalability of up to 40× at nominal throughput in function of precision and sparsity,
These can be reconfigured to compute 2×1-8b or 4×1-4b (N×1-16b/N, with N the hereby enabling always-on hierarchical recognition.
level of subword-parallelism), rather than 1×1-16b words per cycle, when
operating at less than 8b precision. At constant data throughput, this permits Figure 14.5.7 shows a die photo of Envision, illustrating the physical placement
lowering the processor’s frequency and voltage significantly below DVAS values. of its 3 power domains in a 1.29×1.45mm2 active area.
As a result, DVAFS is a dynamic precision technique which simultaneously lowers
all run-time adaptable parameters influencing power consumption: activity α, Acknowledgements:
frequency f and voltage V. Moreover, in contrast to DAS and DVAS, which only This work is partially funded by FWO and Intel Corporation. We thank Synopsys
save energy in precision-scaled arithmetic blocks, DVAFS allows lowering f and for providing their ASIP Designer tool suite and STMicroelectronics for silicon
V of the full system, including control units and memory, hereby shrinking non- donation. Special thanks to CEA-LETI and CMP for back-end support.
compute energy overheads drastically at low precision.
References:
Energy-efficiency is further improved by modulating the body bias (BB) in an [1] Y. LeCun, et al., “Deep Learning,” Nature, vol. 512, no. 7553, pp 436-444,
FDSOI technology. This permits tuning of the dynamic vs. leakage power balance 2015.
while considering the computational precision. At high precision, reducing Vt [2] L. Cavigelli, et al., “Origami: A Convolutional Network Accelerator,” IEEE Great
allows a scaling down of the supply voltage to reduce dynamic consumption while Lakes Symp. on VLSI, 2015.
maintaining speed, at a limited leakage energy cost and an overall efficiency [3] Y.H. Chen, et al., "Eyeriss: An Energy-Efficient Reconfigurable Accelerator for
increase. At low precision, and reduced switching activity, Vt and the supply Deep Convolutional Neural Networks," ISSCC, pp. 262-263, 2016.
voltage are increased to lower the leakage overhead at constant speed. This [4] B. Moons, et al., "A 0.3-2.6 TOPS/W Precision-Scalable Processor for Real-
increases dynamic energy, but reduces the overall energy consumption. Time Large-Scale ConvNets," IEEE Symp. VLSI Circuits, 2016.

15 • 2017 IEEE International Solid-State Circuits Conference 978-1-5090-3758-2/17/$31.00 ©2017 IEEE


2017_Session_14.qxp_2017 12/9/16 12:41 PM Page 16

ISSCC 2017 / February 7, 2017 / 3:45 PM

Figure 14.5.1: Hierarchical face recognition. Figure 14.5.2: DVAFS and body bias tuning.

14

Figure 14.5.3: Top-level architecture of Envision. Figure 14.5.4: Parallel, rounded and guarded data flow.

Figure 14.5.5: Measured efficiency up to 10 TOPS/W. Figure 14.5.6: Embedded ConvNet comparison.

DIGEST OF TECHNICAL PAPERS • 16


2017_Session_14.qxp_2017 12/9/16 12:41 PM Page 17

ISSCC 2017 PAPER CONTINUATIONS

Figure 14.5.7: Die micrograph of Envision.

17 • 2017 IEEE International Solid-State Circuits Conference 978-1-5090-3758-2/17/$31.00 ©2017 IEEE

View publication stats

You might also like