0% found this document useful (0 votes)
204 views11 pages

CNN E: Learning Convolutional Neural Networks With Interactive Visualization

This document describes CNN E XPLAINER, an interactive visualization tool designed to help non-experts learn convolutional neural networks (CNNs). The tool tightly integrates three views: (1) an overview that visualizes the CNN architecture, (2) an elastic view that animates the convolutional computation of selected neurons, and (3) an interactive formula view that demonstrates the mathematical operations of convolution. A user study found that CNN E XPLAINER helps users more easily understand how CNNs work by explaining the connections between low-level operations and high-level outcomes.

Uploaded by

Sanjeeb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
204 views11 pages

CNN E: Learning Convolutional Neural Networks With Interactive Visualization

This document describes CNN E XPLAINER, an interactive visualization tool designed to help non-experts learn convolutional neural networks (CNNs). The tool tightly integrates three views: (1) an overview that visualizes the CNN architecture, (2) an elastic view that animates the convolutional computation of selected neurons, and (3) an interactive formula view that demonstrates the mathematical operations of convolution. A user study found that CNN E XPLAINER helps users more easily understand how CNNs work by explaining the connections between low-level operations and high-level outcomes.

Uploaded by

Sanjeeb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

CNN E XPLAINER: Learning Convolutional Neural Networks with

Interactive Visualization
Zijie J. Wang, Robert Turko, Omar Shaikh, Haekyu Park, Nilaksh Das,
Fred Hohman, Minsuk Kahng, and Duen Horng (Polo) Chau

B Convolutional Elastic Explanation

A Overview
arXiv:2004.15004v2 [cs.HC] 1 May 2020

C Convolutional Interactive Formula

Fig. 1. With CNN E XPLAINER, learners can visually examine how Convolutional Neural Networks (CNNs) transform input images
into classification predictions (e.g., predicting espresso for an image of a coffee cup), and interactively learn about their underlying
mathematical operations. In this example, a learner uses CNN E XPLAINER to understand how convolutional layers work through
three tightly integrated views, each explaining the convolutional process in increasing levels of detail. (A) The Overview visualizes
a CNN architecture where each neuron is encoded as a square with a heatmap representing the neuron’s output, and each edge
connects the neuron with its corresponding inputs and outputs. (B) Clicking a neuron reveals how its activations are computed by the
previous layer’s neurons, displaying the often-overlooked intermediate computation through animations of sliding kernels. (C) The
Convolutional Interactive Formula View allows users to interactively inspect the underlying mathematics of the dot-product operation
core to convolution, through hovering the 3×3 kernel over the input, and interactively studying the corresponding output. For clarity,
visibility of Overview and annotation text is improved, and the overlay is re-positioned.

Abstract— Deep learning’s great success motivates many practitioners and students to learn about this exciting technology. However,
it is often challenging for beginners to take their first step due to the complexity of understanding and applying deep learning. We
present CNN E XPLAINER, an interactive visualization tool designed for non-experts to learn and examine convolutional neural networks
(CNNs), a foundational deep learning model architecture. Our tool addresses key challenges that novices face while learning about
CNNs, which we identify from interviews with instructors and a survey with past students. Users can interactively visualize and
inspect the data transformation and flow of intermediate results in a CNN. CNN E XPLAINER tightly integrates a model overview
that summarizes a CNN’s structure, and on-demand, dynamic visual explanation views that help users understand the underlying
components of CNNs. Through smooth transitions across levels of abstraction, our tool enables users to inspect the interplay between
low-level operations (e.g., mathematical computations) and high-level outcomes (e.g., class predictions). To better understand our tool’s
benefits, we conducted a qualitative user study, which shows that CNN E XPLAINER can help users more easily understand the inner
workings of CNNs, and is engaging and enjoyable to use. We also derive design lessons from our study. Developed using modern
web technologies, CNN E XPLAINER runs locally in users’ web browsers without the need for installation or specialized hardware,
broadening the public’s education access to modern deep learning techniques.
Index Terms—Deep learning, machine learning, convolutional neural networks, visual analytics

1 I NTRODUCTION
Deep learning now enables many of our everyday technologies. Its
continued success and potential application in diverse domains has
• Zijie J. Wang, Robert Turko, Omar Shaikh, Haekyu Park, Nilaksh Das, Fred
Hohman, and Duen Horng (Polo) Chau are with Georgia Institute of
attracted immense interest from students and practitioners who wish
Technology. E-mail: {jayw|rturko3|oshaikh|haekyu|nilakshdas|fredhohman to learn and apply this technology. However, many beginners find it
|polo}@gatech.edu.
xx xxx. xxxx; date of current version xx xxx. xxxx. For information on obtaining
• Minsuk Kahng is with Oregon State University. E-mail: reprints of this article, please send e-mail to: xxxxx@xxxx.xxxx. Digital Object
minsuk.kahng@oregonstate.edu. Identifier: xx.xxxx/xxxx.xxxx.xxxx
Manuscript received xx xxx. xxxx; accepted xx xxx. xxxx. Date of Publication
challenging to take the first step in studying and understanding deep A Overview B Elastic View C Interactive Formula
learning concepts. 1.93 + 3.19 + 3.12 +
X -0.21 X -0.25 X -0.1
For example, convolutional neural networks (CNNs), a foundational 1.57 + 2.54 + 3.05 +
deep learning model architecture, is often one of the first and most X -0.12 X -0.08 X 0.02

0.96 + 1.96 + 2.67 =


widely used models that students learn. CNNs are often used in image X 0.13 X -0.08 X -0.15

classification, achieving state-of-the-art performance [28]. However, -2.26

through interviews with deep learning instructors and a survey of past


students, we found that even for this “introductory” model, it can be
Fig. 2. In CNN E XPLAINER, tightly integrated views with different levels
challenging for beginners to understand how inputs (e.g., image data)
of abstractions work together to help users more easily learn about the
are transformed into class predictions. This steep learning curve stems
intricate interplay between a CNN’s high-level structure and low-level
from CNN’s complexity, which typically leverages many computational mathematical operations. (A) the Overview summarizes connections of
layers to reach a final decision. Within a CNN, there are many types of all neurons; (B) the Elastic View animates the intermediate convolutional
network layers, each with a different structure and underlying mathe- computation of the user-selected neuron in the Overview; and (C) Inter-
matical operations. A student who wants to learn about CNNs needs to active Formula interactively demonstrates the detailed calculation on the
develop a mental model of not only how each layer operates, but also selected input in the Elastic View.
how different layers together affect data transformation. Therefore, one
of the main challenges in learning about CNNs is the intricate interplay
between low-level mathematical operations and high-level integration
investigated how our tool could help students better understand
of such operations within the network.
CNN concepts (Sect. 8). Based on these studies, we discuss the
Key challenges in designing learning tools for CNNs. There is advantages and limitations of interactive visual educational tools
a growing body of research that uses interactive visualization to ex- for machine learning.
plain the complex mechanisms of modern machine learning algorithms,
such as TensorFlow Playground [44] and GAN Lab [24], which help • An open-source, web-based implementation that broadens the
students learn about dense neural networks and generative adversarial public’s education access to modern deep learning techniques with-
networks (GANs) respectively. Regarding CNNs, some existing visu- out the need for advanced computational resources. Deploying
alization tools focus on demonstrating the high-level model structure deep learning models conventionally requires significant com-
and connections between layers (e.g., Harley’s Node-Link Visualiza- puting resources, e.g., servers with powerful hardware. In addi-
tion [16]), while others focus on explaining the low-level mathematical tion, even with a dedicated backend server, it is challenging to
operations (e.g., Karpathy’s interactive CNN demo [25]). There is no support a large number of concurrent users. Instead, CNN E X -
PLAINER is developed using modern web technologies, where
visual learning tool that explains and connects CNN concepts from
both levels of abstraction. This interplay between global model struc- all results are directly and efficiently computed in users’ web
ture and local layer operations has been identified as one of the main browsers (Sect. 6.7). Therefore, anyone can access CNN E X -
PLAINER using their web browser without the need for installa-
obstacles to learning deep learning models, as discussed in [44] and
corroborated from our interviews with instructors and student survey. tion or a specialized backend. Our code is open-sourced1 and
CNN E XPLAINER aims to bridge this critical gap. CNN E XPLAINER is available at the following public demo link:
https://poloclub.github.io/cnn-explainer.
Contributions. In this work, we contribute:
Broadening impact of visualization for AI. In recent years, many
• CNN E XPLAINER, an interactive visualization tool designed visualization systems have been developed for deep learning, but very
for non-experts to learn about both CNN’s high-level model struc- few are designed for non-experts [16, 24, 38, 44], as surveyed in [18].
ture and low-level mathematical operations. To help beginners CNN E XPLAINER joins visualization research that introduces begin-
who are interested in deep learning and wish to learn CNNs, our ners to modern machine learning concepts. Applying visualization
tool advances over prior work [16, 25], overcoming unique design techniques to explain the inner workings of complex models has great
challenges identified from a literature review, instructor interviews potential. We hope our work will inspire further research and devel-
and a survey with past students (Sect. 4). opment of visual learning tools that help democratize and lower the
• Novel interactive visualization design of CNN E XPLAINER barrier to understanding and applying artificial intelligent technologies.
(Fig. 1), which integrates coherent overview + detail and care-
fully designed animation to simultaneously summarize intricate 2 BACKGROUND FOR C ONVOLUTIONAL N EURAL N ETWORKS
model structure, while providing context for users to inspect de- This section provides a high-level overview of convolutional neural
tailed mathematical operations. CNN E XPLAINER’s visualization networks (CNNs) in the context of image classification, which will help
techniques work together through fluid transitions between differ- ground our work throughout this paper.
ent abstraction levels, helping users gain a more comprehensive
Image classification has a long history in the machine learning re-
understanding of complex concepts within CNNs (Sect. 6). For
search community. The objective of supervised image classification is
example, CNN E XPLAINER explains the convolutional operation
to map an input image, X, to an output class, Y . For example, given a
with increasing levels of detail (Fig. 2): the Overview (Fig. 2A)
cat image, a sophisticated image classifier would output a class label
helps learners build a mental model of the CNN model structure;
of “cat”. CNNs have demonstrated state-of-the-art performance on this
the Elastic Explanation View (Fig. 2B) illustrates the convolutional
task, in part because of their multiple layers of computation that aim to
computation through animating its kernel sliding operation; and
learn a better representation of image data.
the Interactive Formula View (Fig. 2C) allows users to interactively
CNNs are composed of several different layers (e.g., convolutional
inspect the detailed mathematical calculations.
layers, downsampling layers, and activation layers)—each layer per-
• Design lessons distilled from user studies on an interactive vi- forms some predetermined function on its input data. Convolutional
sualization tool for machine learning education. While visual layers “extract features” to be used for image classification, with early
and interactive approaches have been gaining popularity in ex- convolutional layers in the network extracting low-level features (e.g.,
plaining machine learning concepts to non-experts, little work edges) and later layers extracting more-complex semantic features (e.g.,
has been done to evaluate such tools [23, 37]. We interviewed car headlights). Through a process called backpropagation, a CNN
four instructors who have taught CNNs and conducted a survey learns kernel weights and biases from a collection of input images.
with 19 students who have previously learned about CNNs to iden- These values also known as parameters, which summarize important
tify the needs and challenges for a deep learning educational tool features within the images, regardless of their location. These kernel
(Sect. 4). In addition, we conducted an observational study with
16 students to evaluate the usability of CNN E XPLAINER, and 1 Code: https://github.com/poloclub/cnn-explainer
weights slide across an input image performing an element-wise dot- (discussed in Sect. 8.3). Our work advances AV’s landscape in covering
product, yielding intermediate results that are later summed together modern and pervasive machine learning algorithms.
with the learned bias value. Then, each neuron gets an output based
on the input image. These outputs are also called activation maps. To 3.3 Visual Analytics for Neural Networks & Predictions
decrease the number of parameters and help avoid overfitting, CNNs Many visual analytics tools have been developed to help deep learning
downsample inputs using another type of layer called pooling. Acti- experts analyze their models and predictions [4, 18, 22, 31, 32]. These
vation functions are used in a CNN to introduce non-linearity, which tools support many different tasks. For example, recent work such as
allows the model to learn more complex patterns in data. For example, Summit [19] uses interactive visualization to summarize what features
a Rectified Linear Unit (ReLU) is defined as max (0, x), which outputs a CNN model has learned and how those features interact and attribute
the positive part of its argument. These functions are also often used to model predictions. LSTMVis [47] makes long short-term memory
prior to the output layer to normalize classification scores, for example, (LSTM) networks more interpretable by visualizing the model’s hidden
the activation function called Softmax performs a normalization on states. Similarly, GANVis [49] helps experts to interpret what a trained
unscaled scalar values, known as logits, to yield output class scores that generative adversarial network (GAN) model has learned. People
sum to one. To summarize, compared to classic image classification also use visual analytics tools to diagnose and monitor the training
models that can be over-parameterized and fail to take advantage of process of deep learning models. Two examples, DGMTracker [30]
inherent properties in image data, CNNs create spatially-aware repre- and DeepEyes [40], help developers better understand the training
sentations through multiple stacked layers of computation. process of CNNs and GANs, respectively. Also, visual analytics tools
recently have been developed to help experts detect and interpret the
3 R ELATED W ORK vulnerability in their deep learning models [9, 29]. These existing
3.1 Visualization for Deep Learning Education analytics tools are designed to assist experts in analyzing their model
and predictions, however, we focus on non-experts and learners, helping
Researchers and practitioners have been developing visualization sys- them more easily learn about deep learning concepts.
tems that aim to help beginners learn about deep learning concepts.
Teachable Machine [51] teaches the basic concept of machine learning 4 F ORMATIVE R ESEARCH & D ESIGN C HALLENGES
classification, such as overfitting and underfitting, by allowing users
Our goal is to build an interactive visual learning tool to help students
to train a deep neural network classifier with data collected from their
more easily understand the internal mechanisms of CNN models. To
own webcam or microphone. The Deep Visualization Toolbox [52] also
identify the learning challenges faced by the students, we conducted
uses live webcam images to interactively help users to understand what
interviews with deep learning instructors and surveyed past students.
each neuron has learned. These deep learning educational tools feature
Instructor interviews. To inform our tool’s design, we recruited 4
direct model manipulation as core to their experience. For example,
instructors (2 female, 2 male) who have taught CNNs in a large uni-
users learn about CNNs, dense neural networks, and GANs through
versity. We refer to them as T1-T4 throughout our discussion. One
experimenting with model training in ConvNetJS MNIST demo [25],
instructor teaches computer vision, and the other three teach deep learn-
TensorFlow Playground [44], and GAN Lab [24], respectively. Beyond
ing. We interviewed the participants one-on-one in a conference room
2D visualizations, Node-Link Visualization [16] and TensorSpace [2]
(3/4) and via a video-conferencing software (1/4); each interview lasted
demonstrate deep learning models in 3D space.
around 30 minutes. The interviews were semi-structured. An interview
Inspired by Chris Olah’s interactive blog posts [38], interactive
guide listing the core questions was prepared. During the interviews,
articles explaining deep learning models with interactive visualization
we ensured that all questions were addressed while leaving time for ask-
are gaining popularity as an alternative medium for education [8, 33].
ing follow-up questions as interviewees touched on interesting topics.
Most existing resources, including visualizations and articles, focus on
We aimed to learn (1) how do instructors teach CNNs in a traditional
explaining either the high-level model structures and model training
classroom setting; and (2) what are the key challenges for instructors
process or the low-level mathematics, but not both. However, we
to teach and for students to learn about CNNs.
found that one key challenge for beginners learning about deep learning
models is that it is difficult to connect unfamiliar layer mechanisms
with complex model structures (discussed in Sect. 4). Our work aims Biggest Challenges in Learning CNNs
to address this lack of research in developing visual learning tools to
help learners bridge these two views on deep learning. Connection of math & structure 9
Math behind layers 8
3.2 Algorithm Visualization CNN training workflow 8
Backpropogation 8
Before deep learning started to attract interest from students and prac- Layer and weight dimensions 5
titioners, visualization researchers have been studying how to design Layer connections 4
algorithm visualizations (AV) to help people learn about dynamic be- CNN structure 3
havior of various algorithms [6, 21, 42]. These tools often graphically
represent data structures and algorithms using interactive visualization Most Desired Features for a Visual Learning Tool
and animations [6, 11, 14]. While researchers have found mixed re- Show structure of CNNs 13
sults on AV’s effectiveness in computer science education [7, 10, 12], Use a live CNN model 11
growing evidence has shown that student engagement is the key factor Show math formulas 10
for successfully applying AV in education settings [20, 36]. Naps, et Run on user's own image 10
al. defined a taxonomy of six levels of engagement2 at which learners
Algorithm animation 10
can interact with AVs [36], and studies have shown higher engagement
Explain math in geometric context 9
level leads to better learning outcomes [10, 15, 27, 41].
Explain intermediate computations 9
Deep learning models can be viewed as specialized algorithms com-
prised of complex and stochastic interactions between multiple different Change hyperparameters 9
computational layers. However, there has been little work in designing Explain backpropagation 9
and evaluating visual educational tools for deep learning in the context Upload user's own model 9
of AV. CNN E XPLAINER’s design draws inspiration from the guide- Count
0 2 4 6 8 10 12 14
lines proposed in AV literature (discussed in Sect. 5); our user study
results also corroborate some of the key findings in prior AV research
Fig. 3. Survey results from 19 participants who have previously learned
2 Six
engagement categories: No Viewing, Viewing, Responding, Changing, about CNNs. Top: Biggest challenges encountered during learning.
Constructing, Presenting. Bottom: Desired features for an interactive visual learning tool for CNNs.
Student survey. After the interviews, we recruited students from a an overview of the structure of CNNs, we aim to create a visual
large university who have previously studied CNNs to fill out an online summary of a CNN model by visualizing all layer outputs and
survey. We received 43 responses, and 19 of them (4 female, 15 male) connections in one view. This could help users to visually track
met the criteria. Among these 19 participants, 10 were Ph.D. students, 3 how input image data are transformed to final class predictions
were M.S. students, 5 were undergraduates, and 1 was a faculty member. through a series of layer operations (C1). (Sect. 6.1)
We asked participants what were “the biggest challenges in studying G2. Interactive interface for mathematical formulas. Since CNNs
CNNs” and “the most helpful features if there was a visualization tool employ various complex mathematical functions to achieve high
for explaining CNNs to beginners”. We provided pre-selected options classification performance, it is important for users to understand
based on the prior instructor interviews, but participants could write each mathematical operation in detail (C2). In response, we would
down their own response if it was not included in the options. The like to design an interactive interface for each mathematical for-
aggregated results of this survey are shown in Fig. 3. mula, enabling users to examine and better understand the inner-
Together with a literature review, we synthesized our findings from workings of layers. (Sect. 6.3)
these two studies into the following five design challenges (C1-C5).
G3. Fluid transition between different levels of abstraction. To
C1. Intricate model structure. CNN models consist of many layers, help users connect low-level layer mathematical mechanisms to
each having a different structure and underlying mathematical high-level model structure (C3), we would like to design a focus +
functions [28]. In addition, the connection between two layers is context display of different views, and provide smooth transitions
not always the same—some neurons are connected to every neuron between them. By easily navigating through different levels of
in the previous layer, while some only connect to a single previous CNN model abstraction, users can get a holistic picture of how
neuron. T2 said “It can be very hard for them [students with CNN works. (Sect. 6.4)
less knowledge of neural networks] to understand the structure of
CNNs, you know, the connections between layers.” G4. Clear communication and engagement. Our goal is to design
and develop an interactive system that is easy to understand and en-
C2. Complex layer operations. Different layers serve different pur- gaging to use so that it can help people to more easily learn about
poses in CNNs [13]. For example, convolutional layers exploit the CNNs (C4). We aim to accompany our visualizations with expla-
spatially local correlations in inputs—each convolutional neuron nations to help users to interpret the graphical representation of
connects to only a small region of its input; whereas max pooling the CNN model (Sect. 6.5), and we wish to actively engage people
layers introduce regularization to prevent overfitting. T1 said, “The during the learning process through visualization customizations.
most challenging part is learning the math behind it [CNN model].” (Sect. 6.6)
Many students also reported that CNN layer computations are the
most challenging learning objective (Fig. 3). To make CNNs per- G5. Web-based implementation. To develop an interactive visual
form better than other models in tasks like image classification, learning tool that is accessible for users without specialized compu-
these models have complex and unique mathematical operations tational resources (C5), we would like to use modern web browsers
that many beginners may not have seen elsewhere. as the platform to explain the inner-workings of a CNN model.
We also open-source our code to support future research and de-
C3. Connection between model structure and layer operation. velopment of deep learning educational tools. (Sect. 6.7)
Based on instructor interviews and the survey results from past
students (Fig. 3), one of the cruxes to understand CNNs is under- 6 V ISUALIZATION I NTERFACE OF CNN E XPLAINER
standing the interplay between low-level mathematical operations
(C2) and the high-level model structure (C1). Smilkov et al., cre- This section outlines CNN E XPLAINER’s visualization techniques and
ators of the popular dense neural network learning tool Tensorflow interface design. The interface is built on our prior prototype [50]. We
Playground [44], also found this challenge key to learning about visualize the forward propagation, i.e., transforming an input image into
deep learning models: “It’s not trivial to translate the equations a class prediction, of a pre-trained model (Fig. 4). Users can explore
defining a deep network into a mental model of the underlying a CNN at different levels of abstraction through the tightly integrated
geometric transformations.” In other words, in addition to com- Overview (Sect. 6.1), Elastic Explanation View (Sect. 6.2), and the
prehending the mathematical formulas behind different layers, Interactive Formula View (Sect. 6.3). Our tool allows users to smoothly
students are also required to understand how each operation works transition between these views (Sect. 6.4), provides explanations to
within the complex, layered model structure. help users interpret the visualizations (Sect. 6.5), and engages them to
test hypotheses through visualization customizations (Sect. 6.6). The
C4. Effective algorithm visualization (AV). The success of applying
system is targeted towards beginners and describes all mathematical
visualization to explain machine learning algorithms to beginners
operations necessary for a CNN to classify an image.
[24,44,51] suggests that an AV tool is a promising approach to help
Color scales are used throughout the visualization to show the im-
people more easily learn about CNNs. However, AV tools need
pact of weight, bias, and activation map values. Consistently in the
to be carefully designed to be effective in helping learners gain
an understanding of algorithms [10]. In particular, AV systems
need to clearly explain the mapping between the algorithm and its Unit 1 Unit 2 Unit 3 Unit 4 lifeboat
visual encoding [34], and actively engage learners [27].
ladybug
C5. High barrier of entry for learning deep learning models. Most
pizza
neural networks written in deep learning frameworks, such as Ten-
bell pepper
Convolutional

Convolutional

Convolutional

Convolutional
Max Pooling

Max Pooling

sorFlow [3] and PyTorch [39], are typically trained and deployed
on dedicated servers that use powerful hardware with GPUs. Can school bus
ReLU

ReLU

ReLU

ReLU

we make understanding CNNs more accessible without such re- koala


sources, so that everyone has the opportunity to learn and interact espresso
with deep learning models? red panda
orange
5 D ESIGN G OALS
sport car
Based on the identified design challenges (Sect. 4), we distill the fol-
lowing key design goals (G1G5) for CNN E XPLAINER, an interactive Input Module 1 Module 2 Output
visualization tool to help students more easily learn about CNNs.
G1. Visual summary of CNN models and data flow. Based on the Fig. 4. CNN E XPLAINER visualizes a Tiny VGG model trained with 10
survey results, showing the structure of CNNs is the most desired classes of images. The model uses the same, but fewer, convolutional
feature for a visual learning tool (Fig. 3). Therefore, to give users layers and max-pooling layers as the original VGGNet model [43].
Overview Flatten Elastic Explanation

Softmax Interactive Formula

Fig. 5. CNN E XPLAINER helps users learn about the connection between the output layer and its previous layer via three tightly integrated views.
Users can smoothly transition between these views to gain a more holistic understanding of the output layer’s lifeboat prediction computation. (A)
The Overview summarizes neurons and their connections. (B) The Flatten Elastic Explanation View visualizes the often-overlooked flatten layer,
helping users more easily understand how a high-dimensional max_pool_2 layer is connected to the 1-dimensional output layer. (C) The Softmax
Interactive Formula View further explains how the softmax function that precedes the output layer normalizes the penultimate computation results
(i.e., logits) into class probabilities through linking the (C1) numbers from the formula to (C2) their visual representations within the model structure.

interface, a red to blue color scale is used to Explaining the Flatten Layer (Fig. 5B). The Flatten Elastic Expla-
visualize neuron activation maps as heatmaps, nation View visualizes the operation of transforming an n-dimensional
and a yellow to green color scale represents tensor into a 1-dimensional tensor, which is shown when a user clicks
weights and biases. A persistent color scale legend is present across an output neuron. This flattening operation is often necessary in a
all views, so the user always has context for the displayed colors. We CNN prior to classification so that the fully-connected output layer
chose these distinct, diverging color scales with white representing zero, can make classification decisions. The view uses the pixel’s color from
so that a user can easily differentiate positive and negative values. We the previous layer to encode the flatten layer’s neuron as a short
group layers in the Tiny VGG model, our CNN architecture, into four colored line. Edges connect each flatten layer neuron with its source
units and two modules (Fig. 4). Each unit starts with one convolutional component and intermediate result. These edges are colored based on
layer. Both modules are identical and contain the same sequence the model’s weight value. Users can hover over any component of this
of operations and hyperparameters. To analyze neuron activations connection to highlight the associated edges as well as the flatten
throughout the network with varying contexts, users can alter the range layer’s neuron and the pixel value from the previous layer.
of the heatmap color scale (Sect. 6.6).
6.3 Interactive Formula View
6.1 Overview The Interactive Formula View consists of four variations designed
The Overview (Fig. 1A, Fig. 5A) is the opening view of CNN E X - for convolutional layers, ReLU activation layers, pooling layers, and
PLAINER . This view represents the high-level structure of a CNN: the softmax activation function. After users have built up a mental
neurons grouped into layers with distinct, sequential operations. It model of the CNN model structure from the previous Overview and
shows neuron activation maps for all layers represented as heatmaps Elastic Explanation Views, these four views demonstrate the detailed
with a diverging red to blue color scale. Neurons in consecutive layers mathematics occurring in each layer.
are connected with edges, which connect each neuron to its inputs; to Explaining Convolution, ReLU Activation, and Pooling (Fig. 6A,
see these edges, users simply can hover over any activation map. In B, C)) Each view animates the window-sliding operation on the input
the model, neurons in convolutional layers and the output layer are matrix and output matrix over an interval, so that the user can under-
fully connected to the previous layer, while all other neurons are only stand how each element in the input is connected to the output, and
connected to one neuron in the previous layer. vice versa. In addition, the user can interact with the these matrices
by hovering over the heatmaps to control the position of the sliding
6.2 Elastic Explanation View window. For example, in the Convolutional Interactive Formula View
The Elastic Explanation Views visualize the computations that leads (Sect. 6.3A), as the user controls the window (kernel) position in either
to an intermediate result without overwhelming users with low-level the input or the output matrix, this view visualizes the dot-product
mathematical operations. In CNN E XPLAINER, there are two elastic formula with input numbers and kernel weights directly extracted from
views, namely the Convolutional Elastic Explanation View (Fig. 1B) the current kernel. This synchronization between the input, the output
and the Flatten Elastic Explanation View (Fig. 5B). and the mathematical function enables the user to better understand
Explaining the Convolutional Layer (Fig. 1B). The Convolutional how the kernel convolves a matrix in convolutional layers.
Elastic Explanation View is entered when a user clicks a convolutional Explaining the Softmax Activation (Fig. 6D). This view outlines
neuron from the Overview. This view applies a convolution on each the operations necessary to calculate the classification score. It is ac-
input node of the selected neuron, visualized by a kernel sliding across cessible from the Flatten Elastic Explanation View to explain how the
the input neurons, which yields an intermediate result for each input results (logits) from the previous view lead to the final classification.
neuron. This sliding kernel forms the output heatmap during the anima- The view consists of logit values encoded as circles and colored with a
tion, which imitates the internal process during a convolution operation. light orange to dark orange color scale, which provides users with a
While the sliding kernel animation is in progress, the edges in this visual cue of the importance of every class. This view also includes a
view are represented as flowing-dashed lines; upon the animations corresponding equation, which explains how the classification score is
completion, the edges transition to solid lines. computed. When users enter this view, pairs of each logit circle and
combination allows a user to discern the impact that every logit has on
A
the classification score in the output layer.

6.4 Transitions Between Views


The Overview is the starting state of CNN E XPLAINER and shows
the model architecture. From this high-level view, the user can begin
inspecting layers, connectivity, classifications, and tracing activations
of neurons through the model. When a user is interested in more detail,
they can click on neuron activation maps in the visualization. Neurons
in a layer that have simple one-to-one connections to a neuron in the
previous layer do not require an auxiliary Elastic Explanation View,
so upon clicking one of these neurons, a user will be able to enter
the Interactive Formula View to understand the low-level operation
B that a tensor undergoes at that layer. If a neuron has more complex
connectivity, then the user will enter an Elastic Explanation View first.
In this view, CNN E XPLAINER uses visualizations and annotations
before displaying mathematics. Through further interaction, a user can
hover and click on parts of the Elastic Explanation View to uncover the
mathematical operations as well as examine the values of weights and
biases. The low-level Interactive Formula Views are only shown after
transitioning from the previous two views, so that users can learn about
the underlying mathemtical operations after hainvg a mental model of
the complex and layered CNN model structure.

6.5 Visualizations with Explanations


C CNN E XPLAINER is accompanied by an interactive tutorial article
beneath the interface that explains CNN layer functions, hyperparam-
eters, and outlines CNN E XPLAINER’s interactive features. Learners
can read freely, or jump to specific sections by clicking layer names
or the info buttons (Fig. 6) from the main visualization. The article
provides beginner users detailed information regarding CNNs that can
supplement their exploration of the visualization.
Additionally, text annotations are placed
throughout the visualization (e.g., explaining the
flatten layer operation in the right image), which
further guide users and explain concepts that are
not easily discernible from the visualization alone.
These annotations help users map the underlying
algorithm to its visual encoding.

6.6 Customizable Visualizations


D The Control Panel located across the top of the visualization (Fig. 1)
allows the user to alter the CNN input image and edit the overall repre-
sentation of the network. The Hyperparameter Widget (Fig. 7) enables
the user to experiment with differnt convolution hyperparameters.
Change input image. Users can choose
between (1) preloaded input images for each
output class, or (2) upload their own custom image. Preloaded im-
ages allow a user to easily access data from the classes the model was
originally trained on. User can also freely upload any image for clas-
sification into the ten classes the network was trained on. The user
does not have any limitations on the size of the image they upload.
Fig. 6. The Interactive Formula Views explain the underlying mathe- CNN E XPLAINER resizes and crops a central square of any image
matical operations of a CNN. (A) shows the element-wise dot-product uploaded, so that the image matches network dimensions. The fourth
occurring in a convolutional neuron, (B) visualizes the activation function of six AV tool engagement levels is allowing users to change the AV
ReLU, and (C) illustrates how max pooling works. Users can hover over
tool’s input [36]. Supporting custom image upload engages users, by
heatmaps to display an operation’s input-to-output mapping. (D) inter-
allowing them to analyze the network’s classification decisions and
actively explains the softmax function, helping users connect numbers
from the formula to their visual representations. Users can click the info
interactively testing their own hypotheses on diverse image inputs.
button to scroll to the corresponding section in the tutorial article, and Show network details. A user can toggle the
the play button to start the window sliding animation in (A)-(C). “Show detail” button, which displays additional net-
work specifications in the Overview. When toggled on, the Overview
will reveal layer dimensions and show color scale legends. Additionally,
a user can vary the activation map color scale range. The CNN architec-
its corresponding value in the equation appear sequentially with anima- ture presented by CNN E XPLAINER is grouped into four units and two
tions. As a user hovers over a logit circle, its value will be highlighted modules (Fig. 4). By modifying the drop-down menu in the Control
in the equation along with the logit circle itself, so the user can under- Panel, a user can adjust the color scale range used by the network to
stand how each logit contributes to the softmax function. Hovering investigate activations with different groupings.
over numbers in the equation will also highlight the appropriate logit Explore hyperparameter impact. The tutorial article (Sect. 6.5)
circles. Interacting with logit circles and the mathematical equation in includes an interactive Hyperparameter Widget that allows users to
the 2-dimensional matrix is “unwrapped” to yield a portion of the 1-
dimensional flatten layer. To confirm her assumptions, she clicks
the flatten layer name, which directs her to an explanation in the tu-
torial article underneath the visualization explaining the specifics of the
flatten layer. As she continues to follow the edge after the flatten
layer, she clicks the softmax button which leads her to the Softmax
Interactive Formula View (Fig. 5C). She learns how the outputs of the
flatten layer are normalized by observing the equation linked with
logits through animations. Janis recognizes that her previous course-
work has not taught these “hidden” operations prior to the output
layer, which flatten and normalize the output of the max_pool_2 layer.
CNN E XPLAINER helps Janis learn these often-overlooked operations
through a hierarchy of interactive views that expose increasingly more
Fig. 7. The Hyperparameter Widget, a component of the accompanying detail as they are explored. She now feels more equipped to apply
interactive article, allows users to adjust hyperparameters and observe in CNNs to her virology research.
real time how the kernel’s sliding pattern changes in convolutional layers.
7.2 Teaching Through Interactive Experimentation
A university professor, Damian, is currently teaching a computer vision
experiment with convolutional hyperparameters (Fig. 7). Users can class which covers CNNs. Damian begins his lecture with standard
adjust the input and hyperparameters of the stand-alone visualization slides. After describing the theory of convolutions, he opens CNN
to test how different hyperparameters change the sliding convolutional E XPLAINER to demonstrate the convolution operation working inside a
kernel and the output’s dimensions. This interactive element empha- full CNN for image classification. With CNN E XPLAINER projected to
sizes learning through experimentation by supplementing knowledge the class, Damian transitions from the Overview (Fig. 1A) to the Con-
gained from reading the article and using the main visualization. volutional Elastic Explanation View (Fig. 1B). Damian encourages the
class to interpret the sliding window animation (Fig. 2B) as it generates
6.7 Web-based, Open-sourced Implementation several intermediate results. He then asks the class to predict kernel
CNN E XPLAINER is a web-based, open-sourced visualization tool weights in a specific neuron. To test student’s hypotheses, Damian
to teach students the foundations of CNNs. A new user only needs enters the Convolutional Interactive Formula View (Fig. 1C), to display
a modern web-broswer to access our tool, no installation required. the convolution operation with the true kernel weights. In this view, he
Additionally, other datasets and linear models can be quickly applied can hover over the input and output matrices to answer questions from
to our visualization system due to our robust implementation. the class, and display computations behind the operation.
Model Training. The CNN architecture, Tiny VGG (Fig. 4), pre- Recalled from theory, a student asks a question regarding the impact
sented by CNN E XPLAINER for image classification is inspired by of altering the stride hyperparameter on the animated sliding window
both the popular deep learning architecture, VGGNet [43], and Stan- in convolutional layers. To illustrate the impact of alternative hyperpa-
fords CS231n course notes [26]. It is trained on the Tiny ImageNet rameters, Damian scrolls down to the “Convolutional Layer” section of
dataset [1]. The training dataset consists of 200 image classes and con- the complementary article, where he experiments by adjusting stride
tains 100,000 64×64 RGB images, while the validation dataset contains and other hyperparameters with the Hyperparameter Widget (Fig. 7) in
10,000 images across the 200 image classes. The model is trained using front of the class. With the help of CNN E XPLAINER, students gain a
TensorFlow [3] on 10 handpicked, everyday classes: lifeboat , ladybug , better understanding of the convolution operation and the effect of its
bell pepper , pizza , school bus , koala , espresso , red panda , orange , different hyperparameters by the end of the lecture, but to reinforce the
and sport car . During the training process, the batch size and learning concepts and encourage individual experimentation, Damian provides
rate are fine-tuned using a 5-fold-cross-validation scheme. This simple the class with a URL to the web-based CNN E XPLAINER for students
model achieves a 70.8% top-1 accuracy on the validation dataset. to return to in the future.
Front-end Visualization. CNN E XPLAINER loads the pre-trained
Tiny VGG model and computes forward propagation results in real 8 O BSERVATIONAL S TUDY
time in a user’s web browser using TensorFlow.js [45]. These results We conducted an observational study to investigate how CNN E X -
are visualized using D3.js [5] throughout the multiple interactive views. PLAINER ’s target users (e.g., aspiring deep learning students) would
use this tool to learn about CNNs, and also to test the tool’s usability.
7 U SAGE S CENARIOS
7.1 Beginner Learning Layer Connectivity 8.1 Participants
Janis is a virology researcher using CNNs in a current project. Through CNN E XPLAINER is designed for deep learning beginners who are
an online deep learning course she has a general understanding of the interested in learning CNNs. In this study, we aimed to recruit partic-
goals of applying CNNs, and some basic knowledge of different types ipants who aspire to learn about CNNs and have some knowledge of
of CNN layers, but she needs help filling in some gaps in knowledge. basic machine learning concepts (e.g., knowing what an image classi-
Interested in learning how a 3-dimensional input (RGB image) leads to fier is). We recruited 16 student participants from a large university (4
a 1-dimensional output (vector of class probabilities) in a CNN, Janis female, 12 male) through internal mailing lists (e.g., machine learning
begins exploring the architecture from the Overview. and computer science Ph.D., M.S., and undergraduate students). Seven
After clicking the “Show detail” button, Janis notices that the participants were Ph.D. students, seven were M.S. students, and the
output layer is a 1-dimensional tensor of size 10, while max_pool_2, other two were undergraduates. All participants were interested in
the previous layer, is a 3-dimensional (13×13×10) tensor. Confused, learning CNNs, and none of them had known CNN E XPLAINER be-
she hovers over a neuron in the output layer to inspect connections fore. Participants self-reported their level of knowledge on non-neural
between the final two layers of the architecture: the max_pool_2 layer network machine learning techniques, with an average score of 3.26 on
has 10 neurons; the output layer has 10 neurons each representing a a scale of 0 to 5 (0 being “no knowledge” and 5 being “expert”); and an
class label, and the output layer is fully-connected to the max_pool_2 average score of 2.06 on CNNs (on the same scale). No participant self-
layer. She clicks that output neuron, which causes a transition from the reported a score of 5 for their knowledge on CNNs, and one participant
Overview (Fig. 5A) to the Flatten Elastic Explanation View (Fig. 5B). had a score of 0. To help better organize our discussion, we refer to
She notices that edges between these these two layers intersect a 1- participants with CNN knowledge score of 0, 1 or 2 as B1-B11, where
dimensional flatten layer and pass through a softmax function. By “B” stands for “Beginner”; and those with score of 3 or 4 as K1-K5,
hovering over pixels from the activation map, Janis understands how where “K” stands for “Knowledgeable”.
Usability Evaluation commented, “Good to see the big picture at once and the transition to
different views [...] I like that I can hide details of a unit in a compact
Easy to use 6.25
Easy to understand 5.94
way and expand it when [needed].”
Enjoyable to use 6.88 CNN E XPLAINER employs the fisheye view technique for present-
I will use it in the future 6.37 ing the Elastic Explanation Views (Fig. 1B, Fig. 5B): after transitioning
Helped me to learn 6.37 from the Overview to a specific layer, neighboring layers are still shown
strongly strongly while further layers (lower degree-of-interest) have lower opacity. Par-
disagree 1 2 3 4 5 6 7 agree ticipants found this transition design helpful for them to learn layer-
specific details while having CNN structural context in the background.
Usefulness of Features For instance, K5 said “I can focus on the current layer but still know
Overview 6.62 the same operation goes on for other layers.” Our observations from
Elastic explanation view 6.44 this study suggest that our fluid transition design between different
Interactive formula view 6.50 level of abstraction can help users to better connect unfamiliar layer
Transition between views 6.62 mechanisms to the complex model structure.
Animations 6.81
Input customization 6.50 8.3.2 Animations for enjoyable learning experience
Tutorial article 6.50
Performance 6.62 Another favorite feature of CNN E XPLAINER that participants men-
not at all very tioned was the use of animations, which received the highest rating
useful 1 2 3 4 5 6 7 useful in the exit questionnaire (Fig. 8). In our tool, animations serve two
purposes: to assimilate the relationship between different visual com-
Fig. 8. Average ratings from 16 participants regarding the usability ponents and to help illustrate the model’s underlying operations.
and usefulness of CNN E XPLAINER. Top: Participants thought CNN Transition animation. Layer movement is animated during view
E XPLAINER was easy to use, enjoyable, and helped them learn about transitions. We noticed it helped participants to be aware of different
CNNs. Bottom: All features, especially animations, were rated favorably. views, and all participants navigated through the views naturally. In
addition to assisting with understanding the relationship between dis-
tinct views, animation also helped them discover the linking between
8.2 Procedure different visualization elements. For example, B8 quickly found that
We conducted this study with participants one-on-one via video- the logit circle is linked to its corresponding value in the formula, when
conferencing software. With the permission of all participants, we she saw the circle-number pair appear one-by-one with animation in
recorded the participants’ audio and computer screen for subsequent the Softmax Interactive Formula View (Fig. 5C).
analysis. After participants signed consent forms, we provided them Algorithm animation. Animations that simulate the model’s inner-
a 5-minute overview of CNNs, followed by a 3-minute tutorial of workings helped participants learn underlying operations by validat-
CNN E XPLAINER. Participants then freely explored our tool in their ing their hypotheses. In the Convolutional Elastic Explanation View
computer’s web browser. We also provided a feature checklist, which (Fig. 2B), we animate a small rectangle sliding through one matrix
outlined the main features of our tool and encouraged participants to to mimic the CNN’s internal sliding window. We noticed many par-
try as many features as they could. During the study, participants were ticipants had their attention drawn to this animation when they first
asked to think aloud and share their computer screen with us; they transitioned into the Convolutional Elastic Explanation View. However,
were encouraged to ask questions when necessary. Each session ended they did not report that they understood the convolution operation until
with a usability questionnaire coupled with an exit interview that asked interacting with other features, such as reading the annotation text or
participants about their process of using CNN E XPLAINER, and if this transitioning to the Convolutional Interactive Formula View (Fig. 2C).
tool could be helpful for them. Each study lasted around 50 minutes, Some participants went back to watch the animation multiple times and
and we compensated each participant with a $10 Amazon Gift card. commented that it made sense, for example, K5 said “Very helpful to
see how the image builds as the window slides through,” but others,
8.3 Results such as B9 remarked, “It is not easy to understand [convolution] us-
ing only animation.” Therefore, we hypothesize that this animation
The exit questionnaire included a series of 7-point Likert-scale ques- can indirectly help users to learn about the convolution algorithm by
tions about the utility and usefulness of different views in CNN E X - validating their newly formed mental models of how specific opera-
PLAINER (Fig. 8). All average Likert rating were above 6 except the
tion behave. To test this hypothesis, a rigorous controlled experiment
rating of “easy to understand”. From the high ratings and our observa- would be needed. Related research work on the effect of animation in
tions, participants found our tool easy to use and understand, retained computer science education also found that algorithm animation does
a high engagement level during their session, and eventually gained a not automatically improve learning, but it may lead learners to make
better understanding of CNN concepts. Our observations also reflect predictions of the algorithm behavior which in turn helps learning [7].
key findings in previous AV research [10, 27]. This section describes
Engagement and enjoyable experience. Moreover, we found ani-
design lessons and limitations of our tool distilled from this study.
mations helped to increase participants’ engagement level (e.g., spend-
8.3.1 Transition between different views ing more time and effort) and made CNN E XPLAINER more enjoyable
to use. In the study, many participants repeatedly played and viewed
Several participants (9/16) commented that they liked how our tool different animations. For example, K2 replayed the window sliding
visualizes both high-level CNN structure and explains low-level math- animation multiple times: “The is very well-animated [...] I always
ematical operations on-demand. This feature enables them to better love smooth animations.” B7 also attributed animations to his enjoy-
understand the interplay between low-level layer computations and the able experience with our tool: “[The tool is] enjoyable to use [...] I
overall CNN data transformation—one of the key challenges for under- especially like the lovely animation.”
standing CNN concepts, as we identified from our instructor interviews
and our student survey. For example, initially K4 was confused to 8.3.3 Engaging learning through visualization customization
see the Convolutional Elastic Explanation View, but after reading the
annotation text, he remarked, “Oh, I understand what an intermediate CNN E XPLAINER allows users to modify the visualization. For exam-
layer is now—you run the convolution on the image, then you add all ple, users can change the input image or upload their own image for
those results to get this.” After exploring the Convolutional Interactive classification; CNN E XPLAINER visualizes the new prediction with the
Formula View, he immediately noted, “Every single aspect of the con- new activation maps in every layer. Similarly, users can interactively
volution layer is shown here. [This] is super helpful.” Similarly, B5 explore how hyperparameters affect the convolution operation (Fig. 7).
Hypothesis testing. In this study, many participants but I don’t know why we compute them.” Even though it is still an open
used visualization customization to test their predictions question why CNNs work so well for various applications [13, 53],
of model behaviors. For example, through inspecting the there are some commonly accepted “intuitions” of how different layers
input layer in the Overview, B4 learned that the input help this model class succeed. We briefly explain them in the tutorial
layer comprised multiple different image channels (e.g., article: for example, ReLU function is used to introduce non-linearty in
red, green, and blue). He changed the input image to a red bell pepper the model. However, we believe it is worth designing visualizations that
from Tiny Imagenet (shown on the right) and expected to see high help users to learn about these concepts. For example, allowing users
values in the input red channel: “If I click the red image, I would to change the ReLU activation function to a linear function, and then vi-
see...” After the updated visualization showed what he predicted, he sualizing the new model predictions may help users gain understanding
said “Right, it makes sense.” We found the Hyperparameter Widget of why non-linear activation functions are needed in CNNs.
also allowed participants to test their hypotheses. While reading the
description of convolution hyperparameters in the tutorial article, K3 9 D ISCUSSION AND F UTURE W ORK
noted “Wait, then sometimes they won’t work”. He then modified the
hyperparatmeters in the Hyperparameter Widget and noticed some Explaining training process and CNN architecture intuitions.
combinations indeed did not yield a valid operation output: “It won’t CNN E XPLAINER helps users to learn how a pre-trained CNN model
be able to slide, because the stride and kernel size don’t fit the matrix”. transforms the input image data into a class prediction. As we identified
Engagement. Participants were intrigued to mod- from two preliminary studies and an observational study, students are
ify the visualization, and their engagement sparked fur- also interested in learning about the training process for CNNs, includ-
ther interest in learning CNNs. In the study, B6 spent ing technical approaches such as cross-validation and backpropagation.
a large amount of time on testing the CNN’s behavior We plan work with instructors and students to design and develop new
on edge cases by finding “difficult” images online. He visualizations to address these extensions.
searched with keywords “koala”, “koala in a car”, “bell Generalizing to other neural network models. Our observational
pepper pizza”, and eventually found a bell pepper pizza study demonstrated that supporting users to transition between different
photo (shown on the right3 ). Our CNN model predicted the image as levels of abstraction helps them more easily understand the interplay be-
bell pepper with a probability of 0.71 and ladybug with a probability tween low-level layer operations and high-level model structure. Other
of 0.2. He commented, “The model is not robust [...] oh, the ladybug [’s neural network models, such as long short-term memory networks [17]
high softmax score] might come from the red dot.” Another participant and Transformer models [48], also require learners to understand the
B5 uploaded his own photo as a new input image for the CNN model. intricate layer operations in the context of a complex network struc-
After seeing his picture being classified as espresso , B5 started to use ture. Therefore, our design can be adopted for explaining other neural
our tool to explore the reason of such classification. He traced back the network models to beginners.
activation maps of neurons and the intermediate convolutional results Integrating algorithm visualization best practices. Existing work
from later layers to the early layers. He also asked us how do experts has studied how to design effective visualizations to help students learn
interpret CNNs and said he would be interested in learning more about algorithms. CNN E XPLAINER applies two key design principles from
deep learning interpretability. AV—visualizations with explanations and customizable visualizations
(G4). However, there are many other AV design practices that future
8.3.4 Limitations researchers can integrate in educational deep learning tools, such as
giving interactive “pop quizzes” during the visualization process [35]
While we found CNN E XPLAINER provided participants with an en- and encouraging users to build their own visualizations [46].
gaging and enjoyable learning experience and helped them to more Quantitative evaluation of educational effectiveness. We con-
easily learn about CNNs, we also noticed some potential improvements ducted a qualitative observational study to evaluate the usefulness and
to our current system design from this study. usability of CNN E XPLAINER. We would like to further conduct quan-
Beginners need more guidance. We found that participants with titative user studies with a before-and-after knowledge quiz to compare
less knowledge of CNNs needed more instructions to begin using CNN the educational benefits of our tool and that of traditional educational
E XPLAINER. Some participants reported that the visual representation mediums such as textbooks and lecture videos. It would be particularly
of the CNN and animation initially were not easy to understand, but the valuable to investigate the educational effectiveness of visualization
tutorial article and text annotation in the visualization greatly helped systems that explain deep learning concepts to beginners.
them to interpret the visualization. B8 skimmed through the tutorial
article before interacting with the main visualization. She said, “After
going through the article, I think I will be able to use the tool better 10 C ONCLUSION
[...] I think the article is good, for beginner users especially.” B2 As deep learning is increasingly used throughout our everyday life, it
appreciated the ability to jump to a certain section in the article by is important to help learners take the first step toward understanding
clicking the layer name in the visualization, and he suggested us to this promising yet complex technology. In this work, we present CNN
“include a step-by-step tutorial for first time users [...] There was too E XPLAINER, an interactive visualization system designed for non-
much information, and I didn’t know where to click at the beginning”. experts to more easily learn about CNNs. Our tool runs in modern web
Therefore, we believe adding more text annotation and having a step- browsers and is open-sourced, broadening the public’s education access
by-step tutorial mode could help users who are less familiar with CNNs to modern AI techniques. We discussed design lessons learned from
to better understand the relationship between CNN operations and their our iterative design process and an observational user study. We hope
visual representations. our work will inspire further research and development of visualization
Limited explanation of why CNN works. Some participants, espe- tools that help democratize and lower the barrier to understanding and
cially those less experienced with CNNs, were interested in learning appropriately applying AI technologies.
why the CNN architecture works in addition to learning how a CNN
model makes predictions. For example, B7 asked “Why do we need ACKNOWLEDGMENTS
ReLU?” when he was learning the formula of the ReLU function. B5
understood what a Max Pooling layer’s operation does but was unclear We thank Anmol Chhabria, Kaan Sancak, Kantwon Rogers, and
why it contributes to CNN’s performance: “It is counter-intuitive that the Georgia Tech Visualization Lab for their support and construc-
Max Pooling reduces the [representation] size but makes the model tive feedback. This work was supported in part by NSF grants IIS-
better.” Similarly, B6 commented on the Max Pooling layer: “Why not 1563816, CNS-1704701, NASA NSTRF, DARPA GARD, gifts from
take the minimum value? [...] I know how to compute them [layers], Intel, NVIDIA, Google, Amazon. Use, duplication, or disclosure is sub-
ject to the restrictions as stated in Agreement number HR00112030001
3 Photo by Jennifer Laughlin, used with permission. between the Government and the Performer.
R EFERENCES EValuation of Interactive VisuAl Machine Learning Systems, Oct. 2019.
[24] M. Kahng, N. Thorat, D. H. Chau, F. B. Viegas, and M. Wattenberg. GAN
[1] Tiny ImageNet Visual Recognition Challenge. https: Lab: Understanding Complex Deep Generative Models using Interac-
//tiny-imagenet.herokuapp.com, 2015. tive Visual Experimentation. IEEE Transactions on Visualization and
[2] TensorSpace.js: Neural Network 3D Visualization Framework. https: Computer Graphics, 25(1):310–320, Jan. 2019.
//tensorspace.org, 2018. [25] A. Karpathy. ConvNetJS MNIST demo, 2016.
[3] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, [26] A. Karpathy. CS231n Convolutional Neural Networks for Visual Recogni-
S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, tion, 2016.
S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, [27] C. Kehoe, J. Stasko, and A. Taylor. Rethinking the evaluation of algorithm
M. Wicke, Y. Yu, and X. Zheng. TensorFlow: A System for Large-Scale animations as learning aids: An observational study. International Journal
Machine Learning. In 12th USENIX Symposium on Operating Systems of Human-Computer Studies, 54(2):265–284, Feb. 2001.
Design and Implementation (OSDI), pp. 265–283. Savannah, GA, USA, [28] Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature,
Nov. 2016. 521(7553):436–444, May 2015.
[4] A. Bilal, A. Jourabloo, M. Ye, X. Liu, and L. Ren. Do Convolutional Neu- [29] M. Liu, S. Liu, H. Su, K. Cao, and J. Zhu. Analyzing the Noise Robustness
ral Networks Learn Class Hierarchy? IEEE Transactions on Visualization of Deep Neural Networks. In 2018 IEEE Conference on Visual Analytics
and Computer Graphics, 24(1):152–162, Jan. 2018. Science and Technology (VAST), pp. 60–71. IEEE, Berlin, Germany, Oct.
[5] M. Bostock, V. Ogievetsky, and J. Heer. D3 Data-Driven Documents. IEEE 2018.
Transactions on Visualization and Computer Graphics, 17(12):2301–2309, [30] M. Liu, J. Shi, K. Cao, J. Zhu, and S. Liu. Analyzing the Training
Dec. 2011. Processes of Deep Generative Models. IEEE Transactions on Visualization
[6] M. H. Brown. Algorithm Animation. MIT Press, Cambridge, MA, USA, and Computer Graphics, 24(1):77–87, Jan. 2018.
1988. [31] M. Liu, J. Shi, Z. Li, C. Li, J. Zhu, and S. Liu. Towards Better Analysis of
[7] M. D. Byrne, R. Catrambone, and J. T. Stasko. Evaluating animations as Deep Convolutional Neural Networks. IEEE Transactions on Visualization
student aids in learning computer algorithms. Computers & Education, and Computer Graphics, 23(1):91–100, Jan. 2017.
33(4):253–278, Dec. 1999. [32] S. Liu, D. Maljovec, B. Wang, P.-T. Bremer, and V. Pascucci. Visualizing
[8] S. Carter and M. Nielsen. Using Artificial Intelligence to Augment Human High-Dimensional Data: Advances in the Past Decade. IEEE Transactions
Intelligence. Distill, 2(12), Dec. 2017. on Visualization and Computer Graphics, 23(3):1249–1268, Mar. 2017.
[9] N. Das, H. Park, Z. J. Wang, F. Hohman, R. Firstman, E. Rogers, and D. H. [33] A. Madsen. Visualizing memorization in RNNs. Distill, 4(3):10.23915/dis-
Chau. Massif: Interactive interpretation of adversarial attacks on deep till.00016, Mar. 2019.
learning. In Extended Abstracts of the 2020 CHI Conference on Human [34] R. E. Mayer and R. B. Anderson. Animations need narrations: An ex-
Factors in Computing Systems, CHI ’20. ACM, Honolulu, HI, USA, 2020. perimental test of a dual-coding hypothesis. Journal of Educational
[10] E. Fouh, M. Akbar, and C. A. Shaffer. The Role of Visualization in Psychology, 83(4):484–490, 1991.
Computer Science Education. Computers in the Schools, 29(1-2):95–117, [35] T. L. Naps, J. R. Eagan, and L. L. Norton. JHAVÉ—an environment to
Jan. 2012. actively engage students in Web-based algorithm visualizations. ACM
[11] D. Galles. Data structure visualizations, 2006. SIGCSE Bulletin, 32(1):109–113, Mar. 2000.
[12] S. Grissom, M. F. McNally, and T. Naps. Algorithm visualization in CS [36] T. L. Naps, G. Rößling, V. Almstrum, W. Dann, R. Fleischer, C. Hund-
education: Comparing levels of student engagement. In Proceedings of the hausen, A. Korhonen, L. Malmi, M. McNally, S. Rodger, and J. Á.
2003 ACM Symposium on Software Visualization - SoftVis ’03, pp. 87–94. Velázquez-Iturbide. Exploring the Role of Visualization and Engage-
San Diego, CA, USA, 2003. ment in Computer Science Education. SIGCSE Bull., 35(2):131–152, June
[13] J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu, X. Wang, 2002.
G. Wang, J. Cai, and T. Chen. Recent advances in convolutional neural [37] A. P. Norton and Y. Qi. Adversarial-Playground: A visualization suite
networks. Pattern Recognition, 77:354–377, May 2018. showing how adversarial examples fool deep learning. In 2017 IEEE
[14] P. J. Guo. Online python tutor: Embeddable web-based program visualiza- Symposium on Visualization for Cyber Security (VizSec), pp. 1–4. IEEE,
tion for cs education. In Proceeding of the 44th ACM Technical Symposium Phoenix, AZ, USA, Oct. 2017.
on Computer Science Education - SIGCSE ’13, pp. 579–584. ACM Press, [38] C. Olah. Neural Networks, Manifolds, and Topology, June 2014.
Denver, CO, USA, 2013. [39] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen,
[15] S. Hansen, N. Narayanan, and M. Hegarty. Designing Educationally Effec- Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang,
tive Algorithm Visualizations. Journal of Visual Languages & Computing, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang,
13(3):291–317, June 2002. J. Bai, and S. Chintala. PyTorch: An imperative style, high-performance
[16] A. W. Harley. An Interactive Node-Link Visualization of Convolutional deep learning library. In Advances in Neural Information Processing
Neural Networks. In Advances in Visual Computing, vol. 9474, pp. 867– Systems, pp. 8024–8035. 2019.
877. Springer International Publishing, 2015. [40] N. Pezzotti, T. Hollt, J. Van Gemert, B. P. Lelieveldt, E. Eisemann, and
[17] S. Hochreiter and J. Schmidhuber. Long Short-Term Memory. Neural A. Vilanova. DeepEyes: Progressive Visual Analytics for Designing Deep
Computation, 9(8):1735–1780, Nov. 1997. Neural Networks. IEEE Transactions on Visualization and Computer
[18] F. Hohman, M. Kahng, R. Pienta, and D. H. Chau. Visual Analytics in Graphics, 24(1):98–108, Jan. 2018.
Deep Learning: An Interrogative Survey for the Next Frontiers. IEEE [41] D. Schweitzer and W. Brown. Interactive visualization for the active
Transactions on Visualization and Computer Graphics, 25(8):2674–2693, learning classroom. ACM SIGCSE Bulletin, 39(1):208, Mar. 2007.
Aug. 2019. [42] C. A. Shaffer, M. L. Cooper, A. J. D. Alon, M. Akbar, M. Stewart, S. Ponce,
[19] F. Hohman, H. Park, C. Robinson, and D. H. Chau. Summit: Scaling and S. H. Edwards. Algorithm Visualization: The State of the Field. ACM
Deep Learning Interpretability by Visualizing Activation and Attribution Transactions on Computing Education, 10(3):1–22, Aug. 2010.
Summarizations. IEEE Transactions on Visualization and Computer [43] K. Simonyan and A. Zisserman. Very Deep Convolutional Networks for
Graphics, 26(1):1096–1106, Jan. 2020. Large-Scale Image Recognition. arXiv:1409.1556 [cs], Apr. 2015.
[20] C. Hundhausen and S. Douglas. Using visualizations to learn algorithms: [44] D. Smilkov, S. Carter, D. Sculley, F. B. Viégas, and M. Wattenberg. Direct-
Should students construct their own, or view an expert’s? In Proceeding Manipulation Visualization of Deep Networks. arXiv:1708.03788, Aug.
2000 IEEE International Symposium on Visual Languages, pp. 21–28. 2017.
IEEE Comput. Soc, Seattle, WA, USA, 2000. [45] D. Smilkov, N. Thorat, Y. Assogba, A. Yuan, N. Kreeger, P. Yu, K. Zhang,
[21] C. D. Hundhausen, S. A. Douglas, and J. T. Stasko. A Meta-Study of S. Cai, E. Nielsen, D. Soergel, S. Bileschi, M. Terry, C. Nicholson, S. N.
Algorithm Visualization Effectiveness. Journal of Visual Languages & Gupta, S. Sirajuddin, D. Sculley, R. Monga, G. Corrado, F. B. Viégas,
Computing, 13(3):259–290, June 2002. and M. Wattenberg. TensorFlow.js: Machine Learning for the Web and
[22] M. Kahng, P. Y. Andrews, A. Kalro, and D. H. Chau. ActiVis: Visual Beyond. arXiv:1901.05350 [cs], Feb. 2019.
Exploration of Industry-Scale Deep Neural Network Models. IEEE Trans- [46] J. T. Stasko. Using student-built algorithm animations as learning aids.
actions on Visualization and Computer Graphics, 24(1):88–97, Jan. 2018. ACM SIGCSE Bulletin, 29(1):25–29, Mar. 1997.
[23] M. Kahng and D. H. Chau. How Does Visualization Help People Learn [47] H. Strobelt, S. Gehrmann, H. Pfister, and A. M. Rush. LSTMVis: A
Deep Learning? Evaluation of GAN Lab. In IEEE VIS 2019 Workshop on Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural
Networks. IEEE Transactions on Visualization and Computer Graphics,
24(1):667–676, Jan. 2018.
[48] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
Ł. Kaiser, and I. Polosukhin. Attention is all you need. In Proceedings
of the 31st International Conference on Neural Information Processing
Systems, pp. 6000–6010, 2017.
[49] J. Wang, L. Gou, H. Yang, and H.-W. Shen. GANViz: A Visual Analytics
Approach to Understand the Adversarial Game. IEEE Transactions on
Visualization and Computer Graphics, 24(6):1905–1917, June 2018.
[50] Z. J. Wang, R. Turko, O. Shaikh, H. Park, N. Das, F. Hohman, M. Kahng,
and D. H. Chau. CNN 101: Interactive visual learning for convolutional
neural networks. In Extended Abstracts of the 2020 CHI Conference on
Human Factors in Computing Systems, CHI ’20. ACM, Honolulu, HI,
USA, 2020.
[51] B. Webster. Now anyone can explore machine learning, no coding required,
Oct. 2017.
[52] J. Yosinski, J. Clune, A. Nguyen, T. Fuchs, and H. Lipson. Understanding
Neural Networks Through Deep Visualization. In ICML Deep Learning
Workshop, 2015.
[53] C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals. Understanding
deep learning requires rethinking generalization. In 5th International Con-
ference on Learning Representations (ICLR), Toulon, France, Conference
Track Proceedings, 2017.

You might also like