0% found this document useful (0 votes)

6 views76 pages

Lec 16

The document discusses various methods of representation learning, particularly focusing on self-supervised and contrastive learning techniques. It highlights the importance of creating compact, explanatory, and interpretable representations to improve subsequent learning tasks, such as object and place recognition. Additionally, it covers pretext tasks for self-supervised learning and introduces frameworks like SimCLR and DINO for effective feature extraction and classification.

Uploaded by

sarah.luan011

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views76 pages

Lec 16

Uploaded by

sarah.luan011

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 76

Learning with

Minimal Supervision
ELEC/COMP 447, ELEC/COMP 546
Spring 2025
Representation learning

“Coral”

“Fish”

Image
Compact mental
representation
[Serre, 2014]
CNNs learned the classical visual recognition pipeline!

Edges
Segments

Texture “clown fish”

Parts

Colors
im2vec
layer 3 representation of image

Image

layer 1 representation of image

Represent image as a neural embedding — a vector/tensor of neural activations

(perhaps representing a vector of detected texture patterns or object parts)
Investigating a representation via similarity analysis

How similar are these two images?

How about these two?

[Kriegeskorte et al. 2008]

Investigating a representation via similarity analysis

Representational Dissimilarity Matrix

Neural activation vector

[Kriegeskorte, Mur, Ruff, et al. 2008]

Investigating a representation via similarity analysis
IT Neuronal Units Deep net (in paricular, HMO)

[Yamins, Hong, Cadieu, Solomon, Seibert, DiCarlo, PNAS 2014]

Good representations are…

1. Compact (minimal)

2. Explanatory (sufficient) “Coral”

3. Disentangled (independent factors)

“Fish”
4. Interpretable

5. Make subsequent problem solving easy

[See “Representation Learning”, Bengio 2013, for more commentary]

Supervised object recognition

Learner “Fish”

image X label Y
Supervised object recognition

Learner “Duck”
…

image X label Y
Transfer learning
“Generally speaking, a good representation is one that makes a subsequent
learning task easier.” — Deep Learning, Goodfellow et al. 2016
Object recognition Place recognition

“Fish” ?

Often, what we will be “tested” on is to learn to do a new thing.

Object recognition Place recognition Place recognition

“Fish” bedroom ?

A lot of data A little data

Finetuning starts with the representation learned on a previous

task, and adapts it to perform well on a new task.
Finetuning in practice

Object recognition Place recognition

dolphin
cat
bathroom
grizzly bear
kitchen
angel fish
bedroom
chameleon
living room
clown fish
hallway
iguana
elephant

• The “learned representation” is just the weights and biases, so that’s what we transfer.
• Which weights and biases do we need to finetune? Often just the final layer.
If we keep on finetuning for every new datapoint or task that comes our way, we
get online learning. Humans seem to do this — we never stop learning.

…
Supervised vision Vision in nature

Hand-curated training data Raw unlabeled training data

+ Informative + Cheap
- Expensive - Noisy
- Limited to teacher’s knowledge - Harder to interpret
Autoencoder: A first self-supervised model
compressed image code
(vector z)

Image Reconstructed
image

Encoder Decoder
[e.g., Hinton & Salakhutdinov, Science 2006]
Autoencoder

Image Reconstructed
image
Image Reconstructed
image
compressed image code
(vector z)

Is the code informative about Logistic regression:

object class ?
Layer 1 representation Layer 6 representation

[DeCAF, Donahue, Jia, et al. 2013]

[Visualization technique : t-sne, van der Maaten & Hinton, 2008]
Can we learn better self-supervised features?
Input data

Question: What may be bad about using autoencoders to

learn self-supervised features?

Encoder: 4-layer conv

Two Answers: Decoder: 4-layer upconv
• Autoencoders prioritize reconstruction error → may not
weigh small but semantically important features highly Reconstructed data
enough.

• Autoencoders can “cheat” by memorizing weird features

associated with each image, instead of learning a well-
behaved representation.
Self-supervised
learning
Escher, 1948
Self-supervised learning

Common trick:

• Convert “unsupervised” problem

into “supervised” empirical risk
minimization

• Do so by cooking up “labels”
(prediction targets) from the raw
data itself
Escher, 1948
Self-supervised pretext tasks

?
θ=?

rotation prediction “jigsaw puzzle” image completion colorization

1. Solving the pretext tasks allow the model to learn good features.
2. We can automatically generate labels for the pretext tasks.
Task 1: Relative patch location prediction

? ? ?
? ?
? ? ? [Slide credit: Carl Doersch]
Task 1: Relative patch location prediction

(Image source: Doersch et al., 2015)

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 14 - 27 May 17, 2022
Task 1: Relative patch location prediction

Patch Embedding (representation)

Classifier Input Nearest Neighbors

CNN
CNN CNN Interesting: This representation places
all cat faces together in feature space!

[Slide credit: Carl Doersch]

Task 2: Solving “jigsaw puzzles”

(Image source: Noroozi & Favaro, 2016)

Task 3: Predict missing pixels (inpainting)

Context Encoders: Feature Learning by Inpainting (Pathak et al., 2016)

Source: Pathak et al., 2016

May 17, 2022

L
e
c Task 3: Predicting missing pixels (inpainting)
t
u
r
e
1
4
-

Learning to reconstruct the missing pixels

Source: Pathak et al., 2016

May 17, 2022

Task 4: Split-brain Autoencoder
Idea: cross-channel predictions
L ab

ab L Source:
32 RichardMay
Zhang17, 2022
/ Phillip Isola
Task 4: Split-brain Autoencoder
Idea: cross-channel predictions

Source: Richard Zhang / Phillip Isola

Autoencoder Classification performance
ImageNet Task [Russakovsky et al. 2015]

40 autoencoder
colorization
Raw Reconstructed 35

Accuracy
Data Data
30
Colorization 25

Raw Predicted 10
Grayscale Color
Channel Channels
Layer
Task 5: Rotation prediction

Hypothesis: a model could recognize the correct rotation of an object

only if it has the “visual commonsense” of what the object should look
like unperturbed.

(Image source: Gidaris et al. 2018)

Task 5: Rotation prediction

Self-supervised
learning by rotating
the entire input
images.

The model learns to

predict which rotation
is applied (4-way
classification)

(Image source: Gidaris et al. 2018)

Task 5: Rotation prediction

Self-supervised
learning by rotating
the entire input
images.

The model learns to

predict which rotation
is applied (4-way
classification)

(Image source: Gidaris et al. 2018)

Transfer learned features to supervised learning

Pretrained with full

ImageNet supervision

No pretraining

Self-supervised learning on
ImageNet (entire training
set) with AlexNet.

Finetune on labeled data

from Pascal VOC 2007.
Self-supervised learning with rotation prediction

(Image source: Gidaris et al. 2018)

What about videos pretext tasks?
Analog of jigsaw puzzles?

Sort frames in time!

Unsupervised Representation Learning by Sorting

Sequences. Lee et al., 2017.
Video Pretext Task: Video colorization
Idea: model the temporal coherence of colors in videos

reference frame how should I color these frames?

...

t=1 t=2 t=3

t=0

Source: Vondrick et al., 2018

Video colorization
Idea: model the temporal coherence of colors in videos

reference frame how should I color these frames?

Should be the same color!

...

t=1 t=2 t=3

t=0
Hypothesis: learning to color video frames should allow model to
learn to track regions or objects without labels!

Source: Vondrick et al., 2018

Video colorization

Learning objective:

Establish mappings
between reference and
target frames in a
learned feature space.

Use the mapping as

“pointers” to copy the
correct color (LAB).

Source: Vondrick et al., 2018

Learning to color videos

attention map on the predicted color = weighted loss between predicted color
reference frame sum of the reference color and ground truth color

Source: Vondrick et al., 2018

Colorizing videos (qualitative)
reference frame target frames (gray) predicted color

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 14 - 44 Source: Google AI Blog
Colorizing videos (qualitative)
reference frame target frames (gray) predicted color

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 14 - 45 Source: Google AI Blog
F
L
e
e
ic Tracking emerges from colorization
-t Propagate segmentation masks using learned attention
F
u
e
ri
e
L
1
i
,4
J-
i
a
j
u
n
W
u
, Source: Google AI Blog
F
L
e
e
ic Tracking emerges from colorization
-t Propagate skeletons using learned attention
F
u
e
ri
e
L
1
i
,4
J-
i
a
j
u
n
W
u
, Source: Google AI Blog
Problems with individual pretext tasks

● Coming up with individual pretext tasks is tedious.

● The learned representations may not be general.

?
θ=?

Can we come up with a more general pretext task?

A more general pretext task?
?

θ=?

same object
Contrastive Representation Learning
?

θ=?

attract

repel
A formulation of contrastive learning
Loss function given 1 positive sample and N - 1 negative samples:

x: reference sample; x+ positive sample; x- negative sample

That is: We aim to learn an encoder function f that

yields high score for positive pairs (x, x+) and low
scores for negative pairs (x, x-).
A formulation of contrastive learning
Loss function given 1 positive sample and N - 1 negative samples:
A formulation of contrastive learning
Loss function given 1 positive sample and N - 1 negative samples:

...
A formulation of contrastive learning
Loss function given 1 positive sample and N - 1 negative samples:

score for the score for the N-1

positive pair negative pairs
This seems familiar …
A formulation of contrastive learning
Loss function given 1 positive sample and N - 1 negative samples:

score for the score for the N-1

positive pair negative pairs
This seems familiar …

Cross entropy loss for a N-way softmax classifier!

I.e., learn to find the positive sample from the N samples
SimCLR: generating positive samples
from data augmentation

Source: Chen et al., 2020

SimCLR
[Chen, Kornblith, Norouzi, Hinton, ICML 2020]

[c.f. Becker & Hinton, Nature 1992]

Contrastive pre-training

Self-supervised contrastive learning New recognition task

dolphin
cat
grizzly bear
angel fish
chameleon
tiger
iguana
elephant
Training linear classifier on SimCLR features

Train feature encoder on

ImageNet (entire training set)
using SimCLR.

Freeze feature encoder, train a

linear classifier on top with
labeled data.

Source: Chen et al., 2020

Semi-supervised learning on SimCLR features

Train feature encoder on

ImageNet (entire training set)
using SimCLR.

Finetune the encoder with 1% /

10% of labeled data on ImageNet.

Source: Chen et al., 2020

Semi-supervised learning on SimCLR features

Train feature encoder on

ImageNet (entire training set)
using SimCLR.

Finetune the encoder with 1% /

10% of labeled data on ImageNet.

Source: Chen et al., 2020

Variations: DINO
DINO: Teacher-Student Paradigm
• An image x is transformed into two
views x1 and x2.

• The student is encouraged to match

the output probabilities of the
teacher.

• The teacher slowly updates its

parameters with an exponential
moving average (ema) of the
student’s parameters.
DINO
The Teacher

• Teacher’s parameters are an exponentially weighted average

of student’s parameters over recent duration.
• Teacher sees “global” views, while student sees local views.
Recall: Self-Attention
Multisensory self-supervision

provided label

Virginia de Sa. Learning Classification with Unlabeled Data. NIPS 1994.

[see also “Six lessons from babies”, Smith and Gasser 2005]
State Observations
Observations State
[Slide credit: Andrew Owens]
Predicting ambient sound

[Slide credit: Andrew Owens]

What did the model learn?

Unit #90 of 256

Strongest responses in dataset

Visualization method from (Zhou 2015)

[Slide credit: Andrew Owens]
[Slide credit: Andrew Owens]
CLIP (Contrastive Language–Image Pre-training)
Radford et al., 2021
[Slide Credit: Yann LeCun]

Self Supervised Learning
No ratings yet
Self Supervised Learning
5 pages
SSL 18 Mar 23 PDF
No ratings yet
SSL 18 Mar 23 PDF
50 pages
AML - Lecture - 11 - 19nov24
No ratings yet
AML - Lecture - 11 - 19nov24
103 pages
(Fall 2024) Deep Learning 3
No ratings yet
(Fall 2024) Deep Learning 3
54 pages
4.1 - Unsupervised Visual Representation Learning by Context Prediction
No ratings yet
4.1 - Unsupervised Visual Representation Learning by Context Prediction
10 pages
Self-Supervised Learning and Computer Vision Fast - Ai
No ratings yet
Self-Supervised Learning and Computer Vision Fast - Ai
7 pages
Chapter17 Autoencoders
No ratings yet
Chapter17 Autoencoders
23 pages
6-DeepVisualLearning L6
No ratings yet
6-DeepVisualLearning L6
82 pages
Self-Supervised Visual Learning Insights
No ratings yet
Self-Supervised Visual Learning Insights
13 pages
Research On Learning Representations in Computer Vision
No ratings yet
Research On Learning Representations in Computer Vision
52 pages
Self-Supervised Learning Guide
No ratings yet
Self-Supervised Learning Guide
11 pages
Revisiting Self-Supervised Visual Representation Learning PDF
No ratings yet
Revisiting Self-Supervised Visual Representation Learning PDF
10 pages
Dlincv 161110052148 PDF
No ratings yet
Dlincv 161110052148 PDF
271 pages
5 Autoencoders PDF
No ratings yet
5 Autoencoders PDF
66 pages
DL Tutorial NIPS2015 PDF
No ratings yet
DL Tutorial NIPS2015 PDF
133 pages
Deep Learning for Image Processing
No ratings yet
Deep Learning for Image Processing
48 pages
BMM 2018 - Deep Learning Tutorial
No ratings yet
BMM 2018 - Deep Learning Tutorial
47 pages
cs231n Github Io Understanding CNN
No ratings yet
cs231n Github Io Understanding CNN
8 pages
Adversarial Attacks & GANs Lecture
No ratings yet
Adversarial Attacks & GANs Lecture
21 pages
Cross Training
No ratings yet
Cross Training
11 pages
Learning To Compress Images and Videos
No ratings yet
Learning To Compress Images and Videos
8 pages
Short Course On Deep Learning: Welcome!!
No ratings yet
Short Course On Deep Learning: Welcome!!
57 pages
Deep Learning for Visual Experts
No ratings yet
Deep Learning for Visual Experts
58 pages
CNN Course: Build & Apply Networks
No ratings yet
CNN Course: Build & Apply Networks
95 pages
MN906 AI Watermarking
No ratings yet
MN906 AI Watermarking
99 pages
Deep Generative Models
No ratings yet
Deep Generative Models
55 pages
AAI Module 4
No ratings yet
AAI Module 4
13 pages
Generative Pretraining From Pixels
No ratings yet
Generative Pretraining From Pixels
13 pages
2024 MTH058 Lecture04 AILearningParadigms
No ratings yet
2024 MTH058 Lecture04 AILearningParadigms
85 pages
Pathak Context Encoders Feature CVPR 2016 Paper
No ratings yet
Pathak Context Encoders Feature CVPR 2016 Paper
9 pages
Lecture5 Vit Ink
No ratings yet
Lecture5 Vit Ink
58 pages
CS F425 DL Major Project
No ratings yet
CS F425 DL Major Project
5 pages
Convolutional Neural PDF
No ratings yet
Convolutional Neural PDF
187 pages
Deep Learning for Vision Experts
No ratings yet
Deep Learning for Vision Experts
91 pages
Lecture 12 Learning in Vision 2022
No ratings yet
Lecture 12 Learning in Vision 2022
100 pages
Deep Learning with Keras Basics
No ratings yet
Deep Learning with Keras Basics
58 pages
Unsupervised Learning with DCGANs
No ratings yet
Unsupervised Learning with DCGANs
15 pages
CNN and Autoencoder
No ratings yet
CNN and Autoencoder
56 pages
Deep Learning Workshop NIPS 2010
No ratings yet
Deep Learning Workshop NIPS 2010
73 pages
Lecture 2
No ratings yet
Lecture 2
33 pages
Context Encoders: Feature Learning by Inpainting
No ratings yet
Context Encoders: Feature Learning by Inpainting
12 pages
04introduction To Neural Networks
No ratings yet
04introduction To Neural Networks
62 pages
Generative Pretraining for Images
No ratings yet
Generative Pretraining for Images
12 pages
Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey
No ratings yet
Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey
24 pages
Rec03 - Deep Architectures
No ratings yet
Rec03 - Deep Architectures
65 pages
11.RNN and Transformers
No ratings yet
11.RNN and Transformers
100 pages
Szegedy - Intriguing Properties of Neural Networks
No ratings yet
Szegedy - Intriguing Properties of Neural Networks
10 pages
Weakly Supervised Contrastive Learning
No ratings yet
Weakly Supervised Contrastive Learning
10 pages
Learning With Few Data
No ratings yet
Learning With Few Data
67 pages
Bootstrap PDF
No ratings yet
Bootstrap PDF
11 pages
Self-Supervised Learning: Pretext Tasks
No ratings yet
Self-Supervised Learning: Pretext Tasks
3 pages
Deep Learning Techniques
No ratings yet
Deep Learning Techniques
72 pages
Convolutional Neural Networks: Computer Vision CS 543 / ECE 549 University of Illinois Jia-Bin Huang
No ratings yet
Convolutional Neural Networks: Computer Vision CS 543 / ECE 549 University of Illinois Jia-Bin Huang
76 pages
CERN Deep Learning and Vision
No ratings yet
CERN Deep Learning and Vision
72 pages
A Guide To Self-Supervised Learning in Computer Vision
No ratings yet
A Guide To Self-Supervised Learning in Computer Vision
15 pages
CS7643: Deep Learning Assignment 3: Instructor: Zsolt Kira Deadline: 11:59pm Mar 14, 2021, EST
No ratings yet
CS7643: Deep Learning Assignment 3: Instructor: Zsolt Kira Deadline: 11:59pm Mar 14, 2021, EST
12 pages
Context Encoders Feature Learning by Inpainting
No ratings yet
Context Encoders Feature Learning by Inpainting
9 pages
Innovative Microalgae Pigments As Functional Ingredients in Nutrition
No ratings yet
Innovative Microalgae Pigments As Functional Ingredients in Nutrition
11 pages
Tai-Chi and Qigong
No ratings yet
Tai-Chi and Qigong
9 pages
Información Internacional de Alberta
No ratings yet
Información Internacional de Alberta
21 pages
Volkswagen Noida: A-18-SEC U.P.Noida
No ratings yet
Volkswagen Noida: A-18-SEC U.P.Noida
2 pages
Geology Merit Badge Pamphlet 35904
No ratings yet
Geology Merit Badge Pamphlet 35904
100 pages
Basic Soil-Plant-Water Relationships
No ratings yet
Basic Soil-Plant-Water Relationships
64 pages
Ugly s Residential Wiring Based on the 2020 NEC 2020 Edition Charles R Miller Full Chapters Included
No ratings yet
Ugly s Residential Wiring Based on the 2020 NEC 2020 Edition Charles R Miller Full Chapters Included
136 pages
5.microscope Final
No ratings yet
5.microscope Final
52 pages
How All Systems Can Collapse Overnight 709 - Martin Armstrong
100% (3)
How All Systems Can Collapse Overnight 709 - Martin Armstrong
26 pages
Physica E: Low-Dimensional Systems and Nanostructures: Sciencedirect
No ratings yet
Physica E: Low-Dimensional Systems and Nanostructures: Sciencedirect
6 pages
Trad Game Jath Part
No ratings yet
Trad Game Jath Part
5 pages
Easy Cornbread Recipe - Moist, Fluffy Homemade Cornbread
No ratings yet
Easy Cornbread Recipe - Moist, Fluffy Homemade Cornbread
2 pages
Final Draft Cen 15780 Cleanliness of Ventilation Systems March 2011
100% (2)
Final Draft Cen 15780 Cleanliness of Ventilation Systems March 2011
35 pages
ICT 10 REGISTRY EDIT Lecture
No ratings yet
ICT 10 REGISTRY EDIT Lecture
22 pages
Scorpio Ascendant Traits & Health
No ratings yet
Scorpio Ascendant Traits & Health
1 page
What Is Democracy
No ratings yet
What Is Democracy
24 pages
VI. Component Access/Replacement Circulation Pumps - Disassembly
No ratings yet
VI. Component Access/Replacement Circulation Pumps - Disassembly
4 pages
Winter HHW Ix Session 2023-24
No ratings yet
Winter HHW Ix Session 2023-24
12 pages
OSCE Exam Report
No ratings yet
OSCE Exam Report
53 pages
DLL - Mapeh-Health 9 - Q1 - W2
No ratings yet
DLL - Mapeh-Health 9 - Q1 - W2
13 pages
WC Basics Summary Bartek Kaplan Main PDF
No ratings yet
WC Basics Summary Bartek Kaplan Main PDF
29 pages
Bitcoin Price After Last Halving Went 7.8x - Goog
No ratings yet
Bitcoin Price After Last Halving Went 7.8x - Goog
1 page
Unit 8 Grade 12
No ratings yet
Unit 8 Grade 12
11 pages
Sample Project - Report - Guidelines
No ratings yet
Sample Project - Report - Guidelines
5 pages
Samsung SDI Prismatic EV Battery Cells
No ratings yet
Samsung SDI Prismatic EV Battery Cells
6 pages
Grade 10 Quarter 2 S.Y. 2023 2024
No ratings yet
Grade 10 Quarter 2 S.Y. 2023 2024
4 pages
Saraswat Co-Operative Bank LTD - Wikipedia
No ratings yet
Saraswat Co-Operative Bank LTD - Wikipedia
4 pages
F.R.leavis.
0% (1)
F.R.leavis.
10 pages
Disturbance in Auditory Functions
No ratings yet
Disturbance in Auditory Functions
4 pages
Crude Oil and Products
100% (2)
Crude Oil and Products
34 pages