WSDM21 Tutorial DLAD Slides
WSDM21 Tutorial DLAD Slides
March 8, 2021
Tutorial outline
Overview of challenges and methods
Part 1
• Introduction to anomaly detection
30 min • Problems and challenges Charu Aggarwal
• Deep vs. shallow methods
• Overview of deep anomaly detection approaches
5 min • Q&A
Methods
Part 2
80 min • The modeling perspective
10 min • Break Guansong Pang
15 min • The supervision information perspective
10 min • Implementation and evaluation
5 min • Q&A
Conclusions and future opportunities
Part 3
15 min • Summary of the methods Longbing Cao
• Six possible directions for future research
10 min • Q&A
2
Part 1: Overview of
Challenges and Methods
• Problem definition and applications
• Challenges
• Deep vs. shallow methods
• Overview of deep anomaly detection approaches
• Taxonomy of methods
3
What are Anomalies?
Source: Wikipedia
4
Anomaly detection: Problem Variations
Normal Outlier
Point anomalies Conditional anomalies Group anomalies
Image source: Gupta et al. CIKM Tutorial 2013
5
Real-World Application Domains
Cybersecurity: Social Network and Web Security: Video Surveillance:
attacks, malware, malicious false/malicious accounts, criminal activities, road
apps/URLs, biometric spoofing false/hate/toxic information accidents, violence, etc.
shooting shoplifting
Finance:
Healthcare: Industrial Inspection:
credit card/insurance frauds, market
lesions, tumours, events in Defects, micro-cracks
manipulation, money laundering, etc.
IoT/ICU monitoring, etc.
Astronomy:
Anomalous events
Explosion
Source: Wikipedia, UCF-Crime
8
Key Challenges
9
Key Challenges
10
Traditional (Shallow) Methods and Disadvantages
Statistical/probabilistic-based approaches
• Statistical test-based, depth-based, deviation-based
Proximity-based approach
• Distance-based, density-based, clustering-based
Shallow ML Models
• Construct an unsupervised (one-class) analog of a supervised ML model such as the SVM
• Use unsupervised dimensionality reduction methods, PCA, kernel PCA
Others
• Information-theoretic, subspace methods
Weaknesses
• Weak capability of capturing intricate relationships
• Lots of hand-crafting of algorithms and features [ad hoc]
• Ad hoc nature makes it difficult to incorporate supervision seamlessly
11
Advantages of Deep Learning
12
Deep vs Shallow [Traditional]: Example
13
Deep vs. Shallow [Representation]
14
Deep vs. Shallow: [Algorithm Type]
15
Deep vs. Shallow [Feature Relations]
16
Deep vs. Shallow [Feature Learning
Methods for Diverse Data Types]
Deep methods Shallow methods
Feature space Expressive new space Primitive space
Anomaly detection algo. Defined by NN structure Heuristic or ad hoc
Feature relations captured Intricate Simple
Extracting features in Varying on architectures and Hand-crafted feature
diverse types of data loss functions [e.g., RNN, extractors/off-the-shelf
CNN] methods
MLP, CNN, RNN, GNN, etc. vs. random projection, PCA, subgraph patterns, optical flow, etc.
17
Deep vs. Shallow Methods [Localization]
18
Deep vs. Shallow Methods [Localization]
Anomaly scores → backpropagating to obtain activation maps Model-independent outlying aspect mining
Pang, Guansong, et al. "Self-trained deep ordinal regression for End-to-End video Angiulli, Fabrizio, et al. "Outlying property detection with numerical
anomaly detection.“ In: CVPR. 2020. attributes." Data mining and knowledge discovery 31.1 (2017): 134-163.
19
Three Principal Categories
End-to-end optimization of
Anomaly detection-specific pipeline with score learning
feature learning
Simplest approaches Most methods belong to this category, Often more effective than the
e.g., autoencoder-, GANs-, one-class models other two approaches
Pang, Guansong, et al. Deep learning for anomaly detection: A review. ACM Computing Survey 54, 2,
Article 38 (March 2021), 38 pages. https://doi.org/10.1145/3439950. arXiv preprint. 20
More Detailed Taxonomy
Three high-level
categories of
methods and 11
fine-grained
subcategories of
methods
Pang, Guansong, et al. Deep learning for anomaly detection: A review. ACM Computing Survey 54, 2,
Article 38 (March 2021), 38 pages. https://doi.org/10.1145/3439950. arXiv preprint. 21
Categorization Based on Supervision
Unsupervised approach
• Working on anomaly-contaminated unlabeled data; no manually labeled training data
• Limited work done
Semi-supervised approach
• Assuming the availability of a set of manually labeled normal training data
• Most of current deep methods belong to this approach
Weakly-supervised approach
• Assuming we have some labels for anomaly classes, yet the class labels are partial (i.e., they do not
span the entire set of anomaly class), inexact (i.e., coarse-grained labels), or inaccurate (i.e., some
given labels can be incorrect)
• Limited work done
22
Supervision: Application Settings
23
Part 2: Methods
• The modeling perspective
• Deep learning for feature extraction
• Learning feature representations of normality
• End-to-end anomaly score learning
• Break
• The supervision information perspective
• Unsupervised approach
• Weakly-supervised approach
• Semi-supervised approach
• Implementation and Evaluation
24
Part 2: Methods
• The modeling perspective
• Deep learning for feature extraction ←
• Learning feature representations of normality
• End-to-end anomaly score learning
• Break
• The supervision information perspective
• Unsupervised approach
• Weakly-supervised approach
• Semi-supervised approach
• Implementation and Evaluation
25
Main approach I: Deep learning for
feature extraction
Leveraging existing deep models to extract low- Working purely as
dimensional features for downstream anomaly feature extraction
measures
• The feature extraction and the anomaly scoring are fully disjointed
• Assumption: the extracted features preserve the discriminative
information that helps separate anomalies from normal instances
General framework
1. Given dataset 𝓧 = 𝒙1 , 𝒙2 , ⋯ , 𝒙𝑁 𝑤𝑖𝑡ℎ 𝒙𝒊 ∈ ℝ𝐷 , the approach is
formulated as 𝒛 = 𝜙(𝒙; Θ)
where 𝜙: 𝒳 → 𝒵 is a deep-neural-network-based feature mapping, with 𝒵 ∈ ℝ𝐾 (𝐾 ≪ 𝐷)
2. An anomaly measure, i.e., 𝒇 that has no connection to 𝝓, is then
applied onto the new space to calculate anomaly scores
Two directions: pre-trained models vs directly training deep feature extractors on the target data
26
Direction I: Using pre-trained models
Applications
This approach is commonly used in image or
video anomaly detection
27
Example: VGG + Unmasking
Intuition
• Abnormal video frames are more distinguishable than
normal video frames when compared with adjacent VGG model
frames
The model
1. Two sets of video frames {t-w, …, t} vs. {t+1,…,t+w}
2. The VGG model is used to extract features from these
video frames
3. Iteratively train a binary classifier to distinguish these
two video sets while removing at each step the most
discriminant features (anomalous to unmasking)
4. Mean training classification accuracy as anomaly score
Tudor Ionescu, Radu, et al. "Unmasking the abnormal events in video.“ In: ICCV. 2017.
28
Direction II: Training deep feature
extraction models
Learning deep
General framework feature extractors
1. First training a network 𝝓 such as autoencoders to using training data
extract low-dimensional features 𝒛
2. Then apply an anomaly measure 𝒇 to calculate the
anomaly scores
Applications
Autoencoder is commonly used to instantiate the neural
network mapping 𝝓
• Autoencoder (𝝓) + one-class SVM (𝒇) [Xu et al., BMVC 2015]
• Autoencoder (𝝓) + clustering (𝒇) [Yu et al., KDD 2018]
• Autoencoder (𝝓) + unsupervised classification (𝒇) [Tudor Ionescu et al.,
CVPR 2019]
Tudor Ionescu, Radu, et al. "Object-centric auto-encoders and dummy anomalies for abnormal event detection in video.“ In: CVPR. 2019.
Xu, Dan, et al. "Learning deep representations of appearance and motion for anomalous event detection.“ In: BMVC (2015). 29
Yu, Wenchao, et al. "Netwalk: A flexible deep embedding approach for anomaly detection in dynamic networks.“ In: KDD. 2018.
Section summary
Pros Cons
• Many state-of-the art (pre-trained) deep • Fully disjointing feature extraction and
models and off-the-shelf anomaly detectors anomaly scoring can lead to significant loss
are readily available of anomaly-discriminative information
• More powerful dimensionality reduction • Pre-trained deep models are typically
than popular linear methods limited to specific types of data
• Easy-to-implement • Inherent limitation of existing anomaly
measures
30
Part 2: Methods
• The modeling perspective
• Deep learning for feature extraction ←
• Learning feature representations of normality
• End-to-end anomaly score learning
• Break
• The supervision information perspective
• Unsupervised approach
• Weakly-supervised approach
• Semi-supervised approach
• Implementation and Evaluation
31
Part 2: Methods
• The modeling perspective
• Deep learning for feature extraction √
• Learning feature representations of normality ←
• End-to-end anomaly score learning
• Break
• The supervision information perspective
• Unsupervised approach
• Weakly-supervised approach
• Semi-supervised approach
• Implementation and Evaluation
32
Main approach II – Learning feature
representations of normality
Integrating feature learning with anomaly scoring in some
ways, rather than fully decoupling them as in approach I
• Adapting popular deep approaches for normality feature learning
33
Main approach II – Learning feature
representations of normality
Integrating feature learning with anomaly scoring in some
ways, rather than fully decoupling them as in approach I
• Adapting popular deep approaches for normality feature learning
𝑓 is a traditional anomaly
measure, e.g., one-class measure,
the nearest neighbor distance
34
Taxonomy: the modeling perspective
Three high-level
categories of
methods and 11
fine-grained
subcategories of
methods
Pang, Guansong, et al. Deep learning for anomaly detection: A review. ACM Computing Survey 54, 2,
Article 38 (March 2021), 38 pages. https://doi.org/10.1145/3439950. arXiv preprint. 35
Autoencoders
General Framework
1. Bottleneck architecture + reconstruction loss
2. The larger reconstruction errors the more abnormal
Hawkins, Simon, et al. "Outlier detection using replicator neural networks.“ In: DaWaK. 2002.
37
Autoencoders – ensemble method
Chen, Jinghui, et al. "Outlier detection with autoencoder ensembles.“ In: SDM. 2017.
38
Robust deep autoencoders
Robust PCA
Zhou, Chong, et al. "Anomaly detection with robust deep autoencoders." In: KDD. 2017.
39
Taxonomy: the modeling perspective
Three high-level
categories of
methods and 11
fine-grained
subcategories of
methods
Pang, Guansong, et al. Deep learning for anomaly detection: A review. ACM Computing Survey 54, 2,
Article 38 (March 2021), 38 pages. https://doi.org/10.1145/3439950. arXiv preprint. 40
Generative Adversarial Networks (GANs)
General framework
1. Train a GAN-based model
2. Calculate anomaly scores by looking into the difference between an input instance and its
counterpart generated from the latent space of the generator
41
Example – AnoGAN
Intuition
• Given an instance x, there is generally an instance z in the latent feature space of the generative
network so that the corresponding generated instance G(z) and x are as similar as possible
The model
1. Train a GAN model using the standard GAN objective function
Schlegl, Thomas, et al. "Unsupervised anomaly detection with generative adversarial networks to guide marker discovery." In: IPMI. Springer, Cham, 2017.
42
Example – EBGAN
Intuition
• To add an extra network that learns the mapping from data instances onto the latent space, i.e.,
an inverse of the generator, to avoid the costly search of the latent instance z
The model
1. Train a bi-directional GAN
2. Anomaly scoring
Zenati, Houssam, et al. "Efficient gan-based anomaly detection." arXiv preprint arXiv:1802.06222 (2018).
Image source: Donahue, Jeff, et al. "Adversarial feature learning." In: ICLR, 2017. 43
Taxonomy: the modeling perspective
Three high-level
categories of
methods and 11
fine-grained
subcategories of
methods
Pang, Guansong, et al. Deep learning for anomaly detection: A review. ACM Computing Survey 54, 2,
Article 38 (March 2021), 38 pages. https://doi.org/10.1145/3439950. arXiv preprint. 44
Predictability modeling
General framework
1. Train a current/future instance prediction network
2. Calculate the difference between the predicted instance
and the actual instance as anomaly score.
45
Example – Future frame prediction
Appearance (spatial) constraints:
Intensity and gradient
Intuition The model
• Leverage the difference between a predicted
future frame and its ground truth to detect
an abnormal event in video data
Anomaly scoring:
Liu, Wen, et al. "Future frame prediction for anomaly detection–a new baseline." In: CVPR. 2018.
Ye, Muchao, et al. "Anopcn: Video anomaly detection via deep predictive coding network." In: ACM Multimedia. 2019. 46
Taxonomy: the modeling perspective
Three high-level
categories of
methods and 11
fine-grained
subcategories of
methods
Pang, Guansong, et al. Deep learning for anomaly detection: A review. ACM Computing Survey 54, 2,
Article 38 (March 2021), 38 pages. https://doi.org/10.1145/3439950. arXiv preprint. 47
Self-supervised classification
General framework
1. Apply different augmentation operations to the data
2. Learn a multi-class classification model using instances
augmented with the same operation as one class
3. Calculate the inconsistency of the instance to the model
as anomaly score
Image source: Wang, Siqi, et al. "Effective End-to-end Unsupervised Outlier Detection via Inlier Priority of Discriminative Network.“ In: NeurIPS. 2019.
48
Example – Image geometric transformations
Intuition
• Train a multi-class model to discriminate between dozens of geometric transformations
applied on all the given images
The model
• Self labeling with compositions of horizontal flipping, translations, and rotations, resulting in 72
distinct transformations
• Training a 72-class deep classification model with a standard cross-entropy loss function
• Using softmax statistics to calculate normality score
Pros Cons
• Can leverage existing deep • Some of the methods are limited to specific
autoencoder/GAN/predictability type of data
modeling/self-supervised classification • Methods like GAN/predictability modeling
models for anomaly detection are computationally costly at the training
• The learned representations are generally stage
more effective than the methods as in • Most methods are sensitive to anomaly
approach I contamination; cannot work in
unsupervised settings
50
Taxonomy: the modeling perspective
Three high-level
categories of
methods and 11
fine-grained
subcategories of
methods
Pang, Guansong, et al. Deep learning for anomaly detection: A review. ACM Computing Survey 54, 2,
Article 38 (March 2021), 38 pages. https://doi.org/10.1145/3439950. arXiv preprint. 51
Distance-based measure
52
Deep random distance-based method - REPEN
Intuition
• Learning representations tailored for the random distance-based measure
What is random distance-based measure?
𝑠𝐱 = min 𝐱 − 𝐱 ′
′ 2
𝐱 ∈𝒮
where 𝒮 is a small random data subset
And why it is used?
• Provably and empirically effective
• But less effective on high-dimensional
data
How to learn tailored representations?
• Anomaly query network
*Sugiyama, M., & Borgwardt, K. Rapid distance-based outlier detection via sampling. In: NeurIPS, 26, 467-475. 2013.
*Pang, Guansong, et al. "LeSiNN: Detecting anomalies by identifying least similar nearest neighbours." In: ICDMW, 2015.
53
REPEN – The model
Goal: the representations of
pseudo normal instances
1. Use off-the-shelf detectors to obtain pseudo labels
𝑓Θ 𝒙′ have smaller random
nearest neighbour distances
𝒜 and 𝒩are anomaly and normal candidate sets, respectively than that of pseudo
2. Optimize an anomaly query network by minimizing
anomalies 𝑓Θ 𝒙
where 𝑓 returns the nearest neighbor distance of x in a random data subset S in the learned
representation space:
3. During inference, the same 𝑓 function is used to calculate the random nearest neighbor distance
as the anomaly score
Pang, Guansong, et al. "Learning representations of ultrahigh-dimensional data for random distance-based outlier detection.“ In: KDD. 2018.
54
REPEN – Effectiveness in real-world data
IMP: Relative improvement of REPEN over ORG; SU: Speed-up of REPEN over ORG
Pang, Guansong, et al. "Learning representations of ultrahigh-dimensional data for random distance-based outlier detection.“ In: KDD. 2018.
55
Taxonomy: the modeling perspective
Three high-level
categories of
methods and 11
fine-grained
subcategories of
methods
Pang, Guansong, et al. Deep learning for anomaly detection: A review. ACM Computing Survey 54, 2,
Article 38 (March 2021), 38 pages. https://doi.org/10.1145/3439950. arXiv preprint. 56
One-class classification measure
57
Example – Deep support vector data
description (Deep SVDD)
Intuition
• To learn feature representations tailored for SVDD-based anomaly detection
The model
Soft boundary
with radius r
Hard boundary
MNIST
CIFAR-10
Three high-level
categories of
methods and 11
fine-grained
subcategories of
methods
Pang, Guansong, et al. Deep learning for anomaly detection: A review. ACM Computing Survey 54, 2,
Article 38 (March 2021), 38 pages. https://doi.org/10.1145/3439950. arXiv preprint. 60
Cluster-based measure
Cluster-based measure
The general framework
1. Devise a feature mapping function 𝜙 that maps original data onto
a new representation space
2. Optimize the feature representations using clustering-based loss
3. Anomaly scoring using a cluster-based anomaly measure in the new space
61
Example – Deep autoencoding gaussian
mixture model (DAGMM)
Intuition Objective function
RE Energy Diagonal
• Learn low-dimensional representations for GMM
The model
• An autoencoder compression network
Compressed features
Reconstruction error features
Concatenated features
• A cluster membership estimation network
Pros Cons
• Strong foundation from traditional anomaly • The performance of anomaly detection is
measures (distance/one-class heavily dependent on the specific anomaly
classification/cluster-based measures) in the measures - inherent limitations of the
literature measures
• Working on low-dimensional feature • The clustering process may be biased by
representations that are specifically contaminated anomalies in the training data,
optimized for the anomaly measures, which in turn leads to less effective
resulting in more effective detection representations
63
Part 2: Methods
• The modeling perspective
• Deep learning for feature extraction √
• Learning feature representations of normality ←
• End-to-end anomaly score learning
• Break
• The supervision information perspective
• Unsupervised approach
• Weakly-supervised approach
• Semi-supervised approach
• Implementation and Evaluation
64
Part 2: Methods
• The modeling perspective
• Deep learning for feature extraction √
• Learning feature representations of normality √
• End-to-end anomaly score learning ←
• Break
• The supervision information perspective
• Unsupervised approach
• Weakly-supervised approach
• Semi-supervised approach
• Implementation and Evaluation
65
Main approach III – End-to-end anomaly
score learning
Directly learn anomaly scores in an end-to-end fashion
• Has a neural network that directly learns scalar anomaly scores
• (surrogate) Loss functions for anomaly ranking/classification
• Generally requiring supervision of (synthetic or real) anomaly data
• Not dependent on existing anomaly measures
66
Taxonomy: the modeling perspective
Three high-level
categories of
methods and 11
fine-grained
subcategories of
methods
Pang, Guansong, et al. Deep learning for anomaly detection: A review. ACM Computing Survey 54, 2,
Article 38 (March 2021), 38 pages. https://doi.org/10.1145/3439950. arXiv preprint. 67
Ranking models
68
Ranking models – Deep ordinal regression (SDOR)
Intuition
• Use self-training to iteratively learn the anomaly scores via deep ordinal regression
The model
1. Use initial anomaly scores to produce
pseudo normal and abnormal sets 𝒜 and 𝒩
2. Create the ordinal class labels:
(𝑐1 > 𝑐2)
3. Let , then learn the model with
Pang, Guansong, et al. "Self-trained deep ordinal regression for End-to-End video anomaly detection.“ In: CVPR. 2020.
70
Ranking models – Pairwise relation prediction (PReNet)
Intuition
• Learn the anomaly scores by predicting the
relation of any instance pairs from a few
labeled anomalies and unlabeled instances
The model
• Let 𝒜 be the small labeled anomaly set and 𝒰
be the large unlabeled data set
• Create an ordinal class label based on three
pairwise relations: a-a, a-u, u-u
Three high-level
categories of
methods and 11
fine-grained
subcategories of
methods
Pang, Guansong, et al. Deep learning for anomaly detection: A review. ACM Computing Survey 54, 2,
Article 38 (March 2021), 38 pages. https://doi.org/10.1145/3439950. arXiv preprint. 72
Prior-driven models
73
Deviation networks (DevNet) unlabeled data
anomaly
Intuition
75
Deviation loss on other types of data
Zhang, Jianpeng, et al. "Viral Pneumonia Screening on Chest X-rays Using Confidence- Ding, Kaize, et al. "Few-shot Network Anomaly Detection via Cross-network Meta-learning.“
Aware Anomaly Detection." IEEE Transactions on Medical Imaging (2020). In: The Web Conference (2021).
76
Taxonomy: the modeling perspective
Three high-level
categories of
methods and 11
fine-grained
subcategories of
methods
Pang, Guansong, et al. Deep learning for anomaly detection: A review. ACM Computing Survey 54, 2,
Article 38 (March 2021), 38 pages. https://doi.org/10.1145/3439950. arXiv preprint. 77
Softmax likelihood models
3. Given a test instance, the model directly gives its anomaly score by the event probability
78
Softmax likelihood models – APE
Intuition
• Leverage pairwise feature interactions to estimate the event
likelihood using Noise-Contrastive Estimation (NCE)
Events with categorical features
The model
NCE ‘noise’ sample: Univariate
extrapolation of x
Synthetic binary classification
Three high-level
categories of
methods and 11
fine-grained
subcategories of
methods
Pang, Guansong, et al. Deep learning for anomaly detection: A review. ACM Computing Survey 54, 2,
Article 38 (March 2021), 38 pages. https://doi.org/10.1145/3439950. arXiv preprint. 80
End-to-end one-class classification
Specifically
designed Generic GAN
GAN
Image source: Ngo, Phuc Cuong, et al. "Fence GAN: Towards better anomaly detection." In: ICTAI. 2019.
81
End-to-end one-class classification - OCAN
Intuition
• Use the generator of a `bad’ GANs to generate complementary samples, instead of matching
the original data distribution, which are then used to learn a one-class discriminator to
discriminate normal instances from generated complementary instance
The model
• The generator in complementary GAN
Complementary distribution
• The discriminator
Discriminator in
a regular GAN
Zheng, Panpan, et al. "One-class adversarial nets for fraud detection.“ In: AAAI. 2019.
82
Section summary
Pros Cons
• The anomaly scoring/ranking/classification • At least some form of labeled/synthetic
is optimized in an end-to-end fashion, anomalies are required in these methods, which
normally more effective than the other two may not be applicable to applications where
approaches such labeled anomalies are not available
• Does not depend on any existing anomaly • Since the models are exclusively fitted to detect
measures the few labeled anomalies, they may not be able
to generalize to unseen anomalies that exhibit
different abnormal features to the labeled
anomalies
83
Part 2: Methods
• The modeling perspective
• Deep learning for feature extraction √
• Learning feature representations of normality √
• End-to-end anomaly score learning √
• Break ← 10 min
• The supervision information perspective
• Unsupervised approach
• Weakly-supervised approach
• Semi-supervised approach
• Implementation and Evaluation
84
Part 2: Methods
• The modeling perspective
• Deep learning for feature extraction √
• Learning feature representations of normality √
• End-to-end anomaly score learning √
• Break √
• The supervision information perspective
• Unsupervised approach ←
• Weakly-supervised approach
• Semi-supervised approach
• Implementation and Evaluation
85
Unsupervised approach
Training on anomaly-contaminated unlabeled data
Outlier-aware autoencoders One-class models, soft boundary
• Deep SVDD (ICML18)
86
Part 2: Methods
• The modeling perspective
• Deep learning for feature extraction √
• Learning feature representations of normality √
• End-to-end anomaly score learning √
• Break √
• The supervision information perspective
• Unsupervised approach ←
• Weakly-supervised approach
• Semi-supervised approach
• Implementation and Evaluation
87
Part 2: Methods
• The modeling perspective
• Deep learning for feature extraction √
• Learning feature representations of normality √
• End-to-end anomaly score learning √
• Break √
• The supervision information perspective
• Unsupervised approach √
• Weakly-supervised approach ←
• Semi-supervised approach
• Implementation and Evaluation
88
Weakly-supervised approach 1/2
A limited number of partially labeled anomalies and large unlabeled data
Contrastive feature learning Reinforcement learning*
• Deep distance-based method (REPEN-KDD18)
Prior-driven method
• Deviation network (DevNet-KDD19)
*Pang, Guansong, et al. "Deep Reinforcement Learning for Unknown Anomaly Detection." arXiv preprint:2009.06847 (2020).
+Ruff, Lukas, et al. "Deep semi-supervised anomaly detection." arXiv preprint arXiv:1906.02694 (2019).
89
Weakly-supervised approach 2/2
Inexact anomaly labels (coarse-grained labels)
Multiple instance learning
• Problem setting: Given a large set of videos
with video-level labels of anomaly and normal
classes, we aim to learn detection models to
identify abnormal video frames
Tian, Yu, et al. "Weakly-supervised Video Anomaly Detection with Contrastive Learning of Long and Short-range Temporal Features." arXiv preprint:2101.10030 (2021).
Sultani, Waqas, et al. "Real-world anomaly detection in surveillance videos.“ In: CVPR. 2018.
90
Part 2: Methods
• The modeling perspective
• Deep learning for feature extraction √
• Learning feature representations of normality √
• End-to-end anomaly score learning √
• Break √
• The supervision information perspective
• Unsupervised approach √
• Weakly-supervised approach ←
• Semi-supervised approach
• Implementation and Evaluation
91
Part 2: Methods
• The modeling perspective
• Deep learning for feature extraction √
• Learning feature representations of normality √
• End-to-end anomaly score learning √
• Break √
• The supervision information perspective
• Unsupervised approach √
• Weakly-supervised approach √
• Semi-supervised approach ←
• Implementation and Evaluation
92
Semi-supervised approach
Training on a large labeled normal
dataset
Pang, Guansong, et al. Deep learning for anomaly detection: A review. ACM Computing Survey 54, 2,
Article 38 (March 2021), 38 pages. https://doi.org/10.1145/3439950. arXiv preprint. 93
Part 2: Methods
• The modeling perspective
• Deep learning for feature extraction √
• Learning feature representations of normality √
• End-to-end anomaly score learning √
• Break √
• The supervision information perspective
• Unsupervised approach √
• Weakly-supervised approach √
• Semi-supervised approach ←
• Implementation and Evaluation
94
Part 2: Methods
• The modeling perspective
• Deep learning for feature extraction √
• Learning feature representations of normality √
• End-to-end anomaly score learning √
• Break √
• The supervision information perspective
• Unsupervised approach √
• Weakly-supervised approach √
• Semi-supervised approach √
• Implementation and Evaluation ←
95
Implementation of representative algorithms
Pang, Guansong, et al. Deep learning for anomaly detection: A review. ACM Computing Survey 54, 2,
Article 38 (March 2021), 38 pages. https://doi.org/10.1145/3439950. arXiv preprint. 96
Source codes of representative algorithms
Pang, Guansong, et al. Deep learning for anomaly detection: A review. ACM Computing Survey 54, 2,
Article 38 (March 2021), 38 pages. https://doi.org/10.1145/3439950. arXiv preprint. 97
Publicly available datasets with real anomalies
99
Part 3: Conclusions and
future opportunities
• Summary of the methods ←
• Six possible directions for future research
100
Summary of the methods
101
Part 3: Conclusions and
future opportunities
• Summary of the methods √
• Six possible directions for future research
102
Part 3: Conclusions and
future opportunities
• Summary of the methods √
• Six possible directions for future research ←
103
Direction #1 – Exploring anomaly-
supervisory signals
Unsupervised
• Data reconstruction, generator-discriminator, pseudo class labels, etc.
Self-supervised
• Self-supervised classification, future prediction, etc.
Anomaly measure-driven
• Presuming some distribution of normal/anomalous data, e.g., one-class, cluster, distance, etc.
104
Direction #2 – Deep weakly-supervised
anomaly detection
Few-shot anomaly detection or data-efficient anomaly detection
• Leveraging a few anomaly examples to perform anomaly-informed detection
• Data efficiency?
• Overfitting?
105
Direction #3 – Large-scale normality
learning
Large-scale unsupervised/self-supervised representation
learning specifically designed for anomaly detection
• Any anomaly contamination in the large-scale data?
106
Direction #4 – Deep detection of
complex anomalies
Deep models for conditional/group anomalies
• Capturing complex temporal/spatial dependence
• Learning representations of a set of unordered data points
107
Direction #5 – Interpretable and
actionable deep anomaly detection
Interpretable deep anomaly detection
• Deep models with inherent capability (via activation/attention maps) to provide straightforward
anomaly explanation
108
Direction #6 – Novel applications and
settings
Safety in
Out-of-distribution (OOD) detection autonomous
• Accurate classification while being able to detect any data systems
instances that are drawn far away from the given training
distribution
Curiosity learning
• Curiosity-driven exploration: Encouraging reinforcement
learning agents to explore novel states
Q&A
110