0% found this document useful (0 votes)

32 views62 pages

Lecture 5-7

Uploaded by

mohd.omama

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views62 pages

Lecture 5-7

Uploaded by

mohd.omama

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 62

Supervised Learning

● ML has been largely focused

on this …
● But Lots of other problem
settings are coming up:
○ What if we also have unlabeled data?
○ What if we only have unlabeled data?
○ What if we have poor-quality labels
(e.g., coarse or potentially mistaken?)
○ What if we have many datasets, but
one somehow differing from another?
○ What if we only have one example, or
a few per (new) class?
○ ……
And wait, there are more!
• Transfer Learning
• Semi-supervised learning Setting Source Target Shift Type

• One/Few-shot learning Semi-supervised Single Single None

labeled unlabeled
• Un/Self-Supervised Learning Domain Single Single Non-
Adaptation labeled unlabeled semantic
• Domain adaptation
Domain Multiple Unknown Non-
• Meta-Learning Generalization labeled semantic
• Zero-shot learning Cross-Task Single Single Semantic
Transfer labeled unlabeled
• Continual / Lifelong-learning
Few-Shot Single Single few- Semantic
• Multi-modal learning Learning labeled labeled

• Multi-task learning Un/Self- Single Many labeled Both/Task

Supervised unlabeled
• Active learning
• …
Particularly Meaningful for CV …
“Crystal” “Needle” “Empty”

“0” “1” “2” …

Human expert/ “Sports”

Special equipment/ “News”
Experiment “Science”
…

Cheap and abundant ! Expensive and scarce !

Particularly Meaningful for CV …
image-level labels points bounding boxes scribbles pixel-level labels
horse person
horse

person

1s/class 2.4s/instance 10s/instance 17s/instance 78s/instance

…
Annotation time
Particularly Meaningful for CV …
A Whole Big Field! We try to cover a few …
• Semi-Supervised Learning

• Few-Shot Learning

• Active Learning

• Transfer and Multi-Task Learning

• Self-Supervised Learning
What is Semi-Supervised Learning?
Supervised Learning ○ Training data: both labeled data
(image, label) and unlabeled
data (image)
○ Goal: Use unlabeled data to
improve supervised learning
Semi-Supervised Learning ○ Note: If we have lots of labeled
data, this goal is much harder
An Incomplete List of Methods ….
• Confidence & Entropy – “no matter what, be confident”
• Pseudo labeling
• Entropy minimization
• Virtual Adversarial Training
• Label Consistency – “label is robust to perturbations”
• Pseudo labeling, yet applying different sample augmentations
• Temporal Ensembling, Mean Teacher …
• Regularization
• Weight decay, Dropout …
• Strong/unsupervised data augmentation: MixUp, CutOut, MixMatch …
• Co-Training / Self-Training / Pseudo Labeling / Noisy Student
Pseudo Labeling
● Simple idea:
• Train on labeled data
• Make predictions on unlabeled data
• Pick confident predictions, and add
to training data
• Can do end-to-end (no need to
separate stages)

● Issues:
● “Under-confidence” or flatness –
“sharpening” by entropy
minimization

● “Overconfidence”? – Need better

uncertainty quantification
Label Consistency with Data Augmentations

Make sure that the logits are similar

We can either “ensemble” or “compare” them

MixMatch: A Holistic Approach for Semi-
Supervised Learning

MixUp
29
“Co-Training”
“Co-Training”
• (Blum & Mitchell, 1998) (Mitchell, 1999) assumes that
• features can be split into two sets;
• each sub-feature set is sufﬁcient to train a good classiﬁer.

• Initially two separate classiﬁers are trained with the labeled data, on the two sub-feature sets
respectively.

• Each classifier then classifies the unlabeled data, and “teaches” the other classifier with the
few unlabeled examples (and the predicted labels) they feel most confident.

• Each classiﬁer is retrained with the additional training examples given by the other classiﬁer,
and the process repeats.
“Co-Training”
“Noisy Student”
Few-Shot Learning
Normal Approach?
• Do what we always do: Fine-tuning
– Train classifier on base classes

Cons?
• The training we do on the base
classes does not factor the task into
account
– Freeze features • No notion that we will be performing a
– Learn classifier weights for new classes using bunch of N-way tests
few amounts of labeled data (during “query” • Idea: simulate what we will see during
time!) test time – and can do that many times!

A Closer Look at Few-shot Classification, Wei-Yu Chen, Yen-Cheng Liu,

Zsolt Kira, Yu-Chiang Frank Wang, Jia-Bin Huang
Meta Learning Approach
• Set up a set of smaller tasks during training which simulates what we will be
doing during testing

– Can optionally pre-train features on held-out base classes (not typical)

• Testing stage is now the same, but with new classes
Model-Agnostic Meta-Learning (MAML)
Active Learning
From Education . . .

C. Bonwell and J. Eison [1]: In active learning, students participate in the process and
students participate when they are doing something besides passively listening. It is a model
of instruction or an education action that gives the responsibility of learning to learners
themselves.

. . . to Machine Learning:

Settles [2, p.5]: Active learning systems attempt to overcome the labeling bottleneck by
asking queries in the form of unlabeled instances to be labeled by an oracle. In this
way, the active learner aims to achieve high accuracy using as few labeled instances as
possible, thereby minimizing the cost of obtaining labeled data.

[1] Charles C. Bonwell and James A. Eison. Active [2] Burr Settles. Active learning literature survey. Computer Sciences
learning: Creating excitement in the classroom. ASHE- Technical Report 1648, University of Wisconsin-Madison, Madison,
ERIC Higher Education Report, 1, 1991. Wisconsin, USA, 2009.
Active Learning
Setting
• Some information is costly (some not)
• Active learner controls selection process

Objective
• Select the most valuable information
• Baseline: random selection

Historical Remarks
• Optimal experimental design
• Valerii V. Fedorov. “Theory of Optimal Experiments Design”, Academic Press, 1972.
• Learning with queries/query synthesis
• Dana Angluin. “Queries and concept learning”, Machine Learning, 2:319{342,1988.
• Selective sampling
• David Cohn, L. Atlas, R. Ladner, M. El-Sharkawi, R. II Marks, M. Aggoune, and D. Park. “Training
connectionist networks with queries and selective sampling”, In Advances in Neural Information
Processing Systems (NIPS). Morgan Kaufmann, 1990.
Uncertainty sampling

Idea
• Select those instances where we are least
certain about the label

Approach
• 3 labels preselected
• Linear classifier
• Use distance to the decision boundary as
uncertainty measure

“Training connectionist networks with queries and selective sampling”.

David Cohn, L. Atlas, R. Ladner, M. El-Sharkawi, R. II Marks, M. Aggoune, and D. Park.
In Advances in Neural Information Processing Systems (NIPS). Morgan Kaufmann, 1990.
Uncertainty sampling

Ì easy to implement
Ì fast

¬¬ no exploration (often combined with random sampling)

¬¬ impact not considered (density weighted extensions exist)
¬¬ problem with complex structures (performance can be
even worse than random)

Pure exploitation, does not explore

Can get stuck in regions with high Bayesian error
Ensemble-based Sampling

“Query by committee”, H. Sebastian Seung, Manfred Opper, and Haim Sompolinsky.

Fifth workshop on computational learning theory. Morgan Kaufmann, 1992.
Transfer Learning
Improve Learning New Task
by Learned Task
Multi-Task Learning
Transfer Learning: Main Solutions
• Instance (Data) Transfer
• Reweight instances of target data according to source
• Example: importance sampling; some “style-transfer” for data adaptation

• Feature Transfer
• Mapping features of source and target data in a common space
• Example: TCA; common pre-training + tuning methods in DL

• Parameter Transfer
• Learn target model parameters according to source model
• Example: Multi-task learning; Net2Net
How transferable are deep learning features?
Net2Net Transfer
• Net2Net reuses information of an already trained deep model to
speedup training of a new model (potentially different topology)
Net2Net Transfer
Multi-Task Learning: Main Solutions
• Direct Parameter Sharing (straightforward)
• Examples: shared weights or activations in neural networks; shared parameters
in Gaussian process

• Structural Regularization
• Can be designed to incorporate various assumptions and domain knowledge
• Can be trained using large-scale optimization algorithms on big data
• The key is to design the regularization term that couples the tasks
• Classical examples: group sparsity, low-rank, parameters grouping…
General Multi-Task Learning Schematic in DNNs

• Can often help tasks by fewer labels, due to knowledge sharing… (“positive transfer”)
• But can backfire some tasks during collaboration too, due to cross-task conflict… (“negative transfer”)
Now let’s get ambitious: learning with NO Labels!!
First category of unsupervised learning
● Generative modeling
○ Generate or otherwise model pixels in the input space
○ Pixel-level generation is computationally expensive
○ Generating images of high-fidelity may not be necessary for
representation learning

Autoencoder Generative Adversarial Nets

Image credit: Xifeng Guo, Thalles Silva.

Second category of unsupervised learning
● Discriminative modeling
○ Train networks to perform pretext tasks where both the inputs and
labels are derived from an unlabeled dataset.
○ Heuristic-based pretext tasks: rotation prediction, relative patch
location prediction, colorization, solving jigsaw puzzle.
○ Many heuristics seem ad-hoc and may be limiting.

Images: [Gidaris et al 2018, Doersch et al 2015]

Motivation and Methodology
Main Tasks in Use:
■ Reconstruct from a corrupted
(or partial) version
■ Denoising Autoencoder
■ In-painting
■ Colorization
■ Visual common-sense tasks
■ Relative patch prediction
■ Jigsaw puzzles
■ Rotation
■ Contrastive Learning
■ word2vec
Yann LeCun’s cake
■ Contrastive Predictive
Slide: LeCun
Coding (CPC)
■ MoCO, simCLR …
Example: Solving Jigsaw Puzzles
Simple Contrastive Learning (simCLR)

• Simple idea: maximizing the agreement of representations

under data transformation, using a contrastive loss in the
latent/feature space
• Super effective: 10% relative improvement over previous
SOTA (cpc v2), outperforms AlexNet with 100X fewer labels
Simple Contrastive Learning Contrast (simCLR)
simCLR uses random crop and color distortion for augmentation.
Examples of augmentation applied to the left most images:
Simple Contrastive Learning Contrast (simCLR)

f(x) is the base network that computes internal representation.

Default simCLR uses (unconstrained) ResNet in this work.

However, it can be other networks.
Simple Contrastive Learning Contrast (simCLR)

g(h) is a projection network that project representation to a

latent space.

simCLR use a 2-layer non-linear MLP

Simple Contrastive Learning Contrast (simCLR)

In the h-representation space we do two things:

• “Pull” positive pairs closer together (two contrastive
“views” generated from the same sample, only with
different data augmentations
• “Push” negative pairs further away

Loss function (InfoNCE):

Original image crop 1 crop 2 contrastive image

simCLR algorithm in pseudo code
Take-home key points:
• Benefit from large batch sizes (at least, 1k-2k
per minibatch)
• Composition of augmentations are crucial.
Contrastive learning needs stronger data/color
augmentation than supervised learning
• A nonlinear projection head improves the
representation quality of the layer before it
• “Temperature hyperparameter” in the
contrastive loss is very critical
• simCLR can immediately be used to few-shot,
semi-supervised, and transfer learning
• Unsupervised contrastive learning benefits
(more) from bigger models (simCLR v2)
simCLR as a strong semi-supervised learner

“Pre-train, Fine-tune, and Distill”

• Surprise: Bigger models are more label-efficient!
• Using pre-training + fine-tuning, “the fewer the labels, the bigger the model”
Momentum Contrast (MoCo)
Barlow Twins: “Another Dimension” of Contrast
VIC-Reg: A (more) Unified SSL Framework
Promoted a lot by LeCun, etc.
… who argues three essential things constitute a good SSL loss:

• Variance: keeps the variance of each component of the representations

(measured over a batch) above a threshold, to prevent cross-sample collapse.
[contrastive learning, ”push” negative]

• Invariance: make the two similar representations as close to each other as

possible [contrastive learning, ”pull” positive]

• Covariance: decorrelates the variables of one sample’s embedding and prevents

an informational collapse in which the variables would vary together or be highly
correlated. [barlow twins; non-existent in CL]
VIC-Reg (promoted a lot by LeCun, etc.)
• Joint embedding with variance, invariance and covariance regularization
Beyond Contrast Learning:
Masked Auto-Encoder (MAE)

A more detailed tutorial: https://feichtenhofer.github.io/eccv2022-ssl-tutorial/Tutorial_files/slides/mae_tutorial_xinlei.pdf

How MAE works
How MAE works
How MAE works
How MAE works
MAE works by Reconstruction
MAE works by Reconstruction
MAE: More Take-Home Points
• BERT-like algorithm, but with crucial
design changes for vision
• BERT: 15% is enough
• MAE: a high ratio of 75% - 80% is optimal
• Very efficient when coupled with high mask
ratio (75%)
• MAR has large encoder on visible tokens
• … + small decoder on all tokens
• … + projection layer to connect the two
• After pre-training, throw away the decoder
• Intriguing properties – better scalability
• work with minimal data augmentation
Contrastive Language-Image Pre-training (CLIP)

https://openai.com/blog/clip/
CLIP is highly data-efficient, flexible and general

Some Limitations:
• struggles on abstract or
systematic tasks
• struggles on very fine-
grained classification
• sometimes sensitive to
wording/phrasing,
needing “prompt
engineering”

https://openai.com/blog/clip/
General Message about Self-Supervised Learning
• MAE has won most CV
downstream tasks (from 2D
to 3D, sparse to dense)

• MoCo/SimCLR still own more

competitive performance in
the few-shot regime

• Maybe we should “hybrid”?

• Lots of open problems

remain when/why an SSL
representation works or not

Data - and AI-driven Methods in Engineering
No ratings yet
Data - and AI-driven Methods in Engineering
40 pages
Lec1 Introduction
No ratings yet
Lec1 Introduction
60 pages
Usc 08
No ratings yet
Usc 08
46 pages
Unit - V
No ratings yet
Unit - V
44 pages
Lecture#12 DM MS (DEIM) Spring 2025
No ratings yet
Lecture#12 DM MS (DEIM) Spring 2025
21 pages
4 CS826 - Meta Learning
No ratings yet
4 CS826 - Meta Learning
40 pages
Unit-V Tranfer Learning Notes
No ratings yet
Unit-V Tranfer Learning Notes
27 pages
Lecture3 Transfer Learning
No ratings yet
Lecture3 Transfer Learning
28 pages
2024 MTH058 Lecture04 AILearningParadigms
No ratings yet
2024 MTH058 Lecture04 AILearningParadigms
85 pages
Semi-Supervised Learning Literature Survey
No ratings yet
Semi-Supervised Learning Literature Survey
59 pages
5 Le
No ratings yet
5 Le
36 pages
Semi-: Supervised Learning
No ratings yet
Semi-: Supervised Learning
40 pages
Chapter 4
No ratings yet
Chapter 4
43 pages
Simple Introduction of Neural Network
No ratings yet
Simple Introduction of Neural Network
28 pages
Learning Scenarios Supervised Learning Unsupervised Learning Unit 1 Part B
No ratings yet
Learning Scenarios Supervised Learning Unsupervised Learning Unit 1 Part B
25 pages
UCS - 401 - Unit-LV - Trends in Machine Learning - Model and Symbols - Bagging and Boosting, Multitask
No ratings yet
UCS - 401 - Unit-LV - Trends in Machine Learning - Model and Symbols - Bagging and Boosting, Multitask
44 pages
2024 MTH058 Lecture08 N ShotLearning
No ratings yet
2024 MTH058 Lecture08 N ShotLearning
39 pages
AAM Ans
No ratings yet
AAM Ans
3 pages
ML 1 2 3
No ratings yet
ML 1 2 3
54 pages
U1 - ML
No ratings yet
U1 - ML
5 pages
Lect 0407
No ratings yet
Lect 0407
6 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
15 pages
Introduction To Deep Learning AI 2025
No ratings yet
Introduction To Deep Learning AI 2025
78 pages
Pseudo Label Final
No ratings yet
Pseudo Label Final
7 pages
MLSys 2022 Taglets A System For Automatic Semi Supervised Learning With Auxiliary Data Paper
No ratings yet
MLSys 2022 Taglets A System For Automatic Semi Supervised Learning With Auxiliary Data Paper
21 pages
Intro to Learning Theory
No ratings yet
Intro to Learning Theory
35 pages
Lecture 11 Transfer and Few-Shot Learning
No ratings yet
Lecture 11 Transfer and Few-Shot Learning
47 pages
A Survey On Semi-, Self - and Unsupervised Learning For Image Classification
No ratings yet
A Survey On Semi-, Self - and Unsupervised Learning For Image Classification
33 pages
Transfer Learning Basics & Strategies
No ratings yet
Transfer Learning Basics & Strategies
19 pages
AML Unit-3 Material
No ratings yet
AML Unit-3 Material
26 pages
465-Lecture 1 (Deep Learning)
No ratings yet
465-Lecture 1 (Deep Learning)
47 pages
Unit 5 1
No ratings yet
Unit 5 1
113 pages
(Synthesis Lectures On Artificial Intelligence and Machine Learning) Xiaojin Zhu, Andrew. B Goldberg - Introduction To Semi-Supervised Learning-Springer (2009)
No ratings yet
(Synthesis Lectures On Artificial Intelligence and Machine Learning) Xiaojin Zhu, Andrew. B Goldberg - Introduction To Semi-Supervised Learning-Springer (2009)
122 pages
My Hands-On ML Notebook
No ratings yet
My Hands-On ML Notebook
5 pages
Unit Iii
No ratings yet
Unit Iii
26 pages
AI Model Training Optimization
No ratings yet
AI Model Training Optimization
12 pages
Unit I
No ratings yet
Unit I
17 pages
Intro To ML
No ratings yet
Intro To ML
107 pages
ML Lecture#1
No ratings yet
ML Lecture#1
52 pages
Transfer Learnring
No ratings yet
Transfer Learnring
5 pages
Unit 1 ML
No ratings yet
Unit 1 ML
28 pages
Advances in AI: Module-1
No ratings yet
Advances in AI: Module-1
23 pages
Democratic Co Learning
No ratings yet
Democratic Co Learning
9 pages
AI Classification & Imbalanced Data
No ratings yet
AI Classification & Imbalanced Data
85 pages
Pre Study
No ratings yet
Pre Study
14 pages
Machine Learning: A Review of Learning Types
No ratings yet
Machine Learning: A Review of Learning Types
7 pages
Session 5
No ratings yet
Session 5
33 pages
Learning
No ratings yet
Learning
48 pages
Intro to Machine Learning Basics
No ratings yet
Intro to Machine Learning Basics
71 pages
Data Science Guide
100% (1)
Data Science Guide
275 pages
A Survey On Contrastive Self-Supervised Learning
No ratings yet
A Survey On Contrastive Self-Supervised Learning
21 pages
Mod 4
No ratings yet
Mod 4
45 pages
Larning Introduction
No ratings yet
Larning Introduction
6 pages
Using Weighted Nearest Neighbor To Benef PDF
No ratings yet
Using Weighted Nearest Neighbor To Benef PDF
12 pages
(Fall 2024) Deep Learning 3
No ratings yet
(Fall 2024) Deep Learning 3
54 pages
Transfer Learning: A Comprehensive Survey
No ratings yet
Transfer Learning: A Comprehensive Survey
42 pages
Label Propagation For Deep Semi-Supervised Learning
No ratings yet
Label Propagation For Deep Semi-Supervised Learning
10 pages
Daily Dose of Data Science Full Archive
No ratings yet
Daily Dose of Data Science Full Archive
53 pages
Lecture 24 26
No ratings yet
Lecture 24 26
123 pages
Lecture 18 20
No ratings yet
Lecture 18 20
65 pages
Lecture 1-4
No ratings yet
Lecture 1-4
76 pages
Lecture 15 17
No ratings yet
Lecture 15 17
44 pages
Lecture 11 14
No ratings yet
Lecture 11 14
91 pages
Eng8 Q2 Module 5 Ms. Morales Calisquez
No ratings yet
Eng8 Q2 Module 5 Ms. Morales Calisquez
16 pages
Daily Lesson Plan
No ratings yet
Daily Lesson Plan
2 pages
Final Evaluation Nurs 2020
No ratings yet
Final Evaluation Nurs 2020
9 pages
Science and Technology and Nation Building: Learning Experiences & Self-Assessment Activities (Saa)
100% (2)
Science and Technology and Nation Building: Learning Experiences & Self-Assessment Activities (Saa)
15 pages
Teacher e-SAT Session Notice
No ratings yet
Teacher e-SAT Session Notice
3 pages
XiJEN Students Class Skipping Study
No ratings yet
XiJEN Students Class Skipping Study
8 pages
7-Scripted Roleplays
No ratings yet
7-Scripted Roleplays
27 pages
ACTIVITY 6.1 Observing Classroom Management and Routines
No ratings yet
ACTIVITY 6.1 Observing Classroom Management and Routines
5 pages
Online Learning Insights for Students
No ratings yet
Online Learning Insights for Students
4 pages
01 Guideline of Ipec 1
No ratings yet
01 Guideline of Ipec 1
178 pages
International Studies at A Glance
100% (1)
International Studies at A Glance
2 pages
Effective Management of Contract Variation
100% (3)
Effective Management of Contract Variation
88 pages
Review of Related Literature and Studies
No ratings yet
Review of Related Literature and Studies
3 pages
Worksheet 1-1:project's Business Objectives
No ratings yet
Worksheet 1-1:project's Business Objectives
5 pages
Curriculum Vitae: Amit Kumar Shrivastava Email
No ratings yet
Curriculum Vitae: Amit Kumar Shrivastava Email
4 pages
Tenure Letter For Mariya Kulangiyev
No ratings yet
Tenure Letter For Mariya Kulangiyev
2 pages
The Impact of Sheng in The Implementation of
No ratings yet
The Impact of Sheng in The Implementation of
22 pages
Module Structure For Senior Moderator
No ratings yet
Module Structure For Senior Moderator
7 pages
Mil Summative Week 1
No ratings yet
Mil Summative Week 1
1 page
Preschoolobservation
No ratings yet
Preschoolobservation
3 pages
Slater and Narver 1998
No ratings yet
Slater and Narver 1998
7 pages
Aeindray Shinn Thant 2004000001
No ratings yet
Aeindray Shinn Thant 2004000001
2 pages
Destiani, 2 (2), 201-209
No ratings yet
Destiani, 2 (2), 201-209
9 pages
CES Reading Intervention Program
100% (1)
CES Reading Intervention Program
8 pages
Nlpa - East 2
No ratings yet
Nlpa - East 2
80 pages
Section A: Objective Type Questions: Q. 1 Answer Any 5 Out of The Given 6 Questions (1 X 5 5 Marks) 1 1
No ratings yet
Section A: Objective Type Questions: Q. 1 Answer Any 5 Out of The Given 6 Questions (1 X 5 5 Marks) 1 1
3 pages
GR 8 T2 REVISION BOOK 1 Career Categories and Their Impact On South Africa
No ratings yet
GR 8 T2 REVISION BOOK 1 Career Categories and Their Impact On South Africa
7 pages
Obj. 10 TANATAP-TANA Sir Zangie OK
100% (1)
Obj. 10 TANATAP-TANA Sir Zangie OK
3 pages
Form 3 Term 2 English Schemes @0743505350
No ratings yet
Form 3 Term 2 English Schemes @0743505350
18 pages
Identifying Images Lesson Plan
No ratings yet
Identifying Images Lesson Plan
2 pages