0% found this document useful (0 votes)

31 views37 pages

Lecture6 2

The document discusses the Bag-of-Words (BoW) model for image matching, highlighting its efficiency in handling large datasets by focusing on likely matches through global similarity measures. It outlines the process of feature extraction, learning a visual vocabulary via clustering, and representing images using frequency histograms of visual words. Additionally, it emphasizes the importance of TF-IDF weighting and the use of inverted files for efficient image retrieval in large databases.

Uploaded by

asumi288hk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views37 pages

Lecture6 2

Uploaded by

asumi288hk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Bag-of-Words

Object Bag of ‘words’

Image matching
• Brute force approach:

• 250,000 images → ~ 31 billion image pairs

– 2 pairs per second → 1 year on 500 machines

• 1,000,000 images → 500 billion pairs

– 15 years on 500 machines
Image matching

• For city-sized datasets, fewer than 0.1% of

image pairs actually match

• Key idea: only consider likely matches

• How do we know if a match is likely?
• Solution: use fast global similarity measures
– For example, a bag-of-words representation
Object Bag of ‘words’
Origin 1: Texture recognition

histogram

Universal dictionary

Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001;
Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
Origin 2: Bag-of-words models
• Orderless document representation: frequencies of words
from a dictionary Salton & McGill (1983)
Origin 2: Bag-of-words models
• Orderless document representation: frequencies of words
from a dictionary Salton & McGill (1983)

US Presidential Speeches Tag Cloud

http://chir.ag/phernalia/preztags/
Origin 2: Bag-of-words models
• Orderless document representation: frequencies of words
from a dictionary Salton & McGill (1983)

US Presidential Speeches Tag Cloud

http://chir.ag/phernalia/preztags/
Origin 2: Bag-of-words models
• Orderless document representation: frequencies of words
from a dictionary Salton & McGill (1983)

US Presidential Speeches Tag Cloud

http://chir.ag/phernalia/preztags/
Origin 2: Bag-of-words models

John likes to watch movies. Mary likes too.

John also likes to watch football games

{"John": 1, "likes": 2, "to": 3, "watch": 4, "movies": 5,

."also": 6, "football": 7, "games": 8, "Mary": 9, "too": 10}

[1, 2, 1, 1, 1, 0, 0, 0, 1, 1]

[1, 1, 1, 1, 0, 1, 1, 1, 0, 0]
Bag of words

face, flowers, building

Works pretty well image retrieval, recognition and matching

Images as histograms of visual words
• Inspired by ideas from text retrieval
– [Sivic and Zisserman, ICCV 2003]
frequency

…..
visual words
Quiz: What is BoW for one image?
• A histogram of local feature vectors in an
image
• A visual dictionary
• The feature vector of a local image patch
• A histogram of local features in the collection
of images
Bag of features: outline
1. Extract features
Bag of features: outline
1. Extract features
2. Learn “visual vocabulary”
Bag of features: outline
1. Extract features
2. Learn “visual vocabulary”
3. Quantize features using visual vocabulary
Bag of features: outline
1. Extract features
2. Learn “visual vocabulary”
3. Quantize features using visual vocabulary
4. Represent images by frequencies of
“visual words”

Quantize: approximate by one whose amplitude is

restricted to a prescribed set of values.
1. Feature extraction

Compute
SIFT Normalize
descriptor patch
[Lowe’99]

Detect patches
[Mikojaczyk and Schmid ’02]
[Mata, Chum, Urban & Pajdla, ’02]
[Sivic & Zisserman, ’03]

Slide credit: Josef Sivic

1. Feature extraction

…
2. Learning the visual vocabulary

Clustering

Slide credit: Josef Sivic

2. Learning the visual vocabulary
Visual vocabulary
…

Clustering

Slide credit: Josef Sivic

K-means clustering
• Want to minimize sum of squared Euclidean
distances between points xi and their
nearest cluster centers mk

D( X , M ) =   i k
( x −
cluster k point i in
m ) 2

cluster k

Algorithm:
• Randomly initialize K cluster centers
• Iterate until convergence:
• Assign each data point to the nearest center
• Recompute each cluster center as the mean of all points
assigned to it
K-means clustering

https://en.wikipedia.org/wiki/
File:K-means_convergence.gif
From clustering to vector quantization
• Clustering is a common method for learning a
visual vocabulary or codebook
• Unsupervised learning process
• Each cluster center produced by k-means becomes a
codevector
• Codebook can be learned on separate training set
• Provided the training set is sufficiently representative, the
codebook will be “universal”

• The codebook is used for quantizing features

• A vector quantizer takes a feature vector and maps it to the
index of the nearest codevector in a codebook
• Codebook = visual vocabulary
• Codevector = visual word
Example visual vocabulary

Fei-Fei et al. 2005

Image patch examples of visual words

Sivic et al. 2005

3. Image representation
frequency

…..
codewords
Large-scale image matching
• Bag-of-words models have
been useful in matching an
image to a large database of
object instances

11,400 images of game covers how do I find this image in the database?
(Caltech games dataset)
Large-scale image search
• Build the database:
– Extract features from the
database images
– Learn a vocabulary using k-
means (typical k: 100,000)
– Compute weights for each
word
– Create an inverted file
mapping words → images
Weighting the words
• Just as with text, some visual words are more
discriminative than others

the, and, or vs. cow, AT&T, Cher

• the bigger fraction of the documents a word

appears in, the less useful it is for matching
– e.g., a word that appears in all documents is not
helping us
TF (term frequency)-
IDF(inverse document frequency) weighting
• Instead of computing a regular histogram
distance, we’ll weight each word by it’s
inverse document frequency

• inverse document frequency (IDF) of word j =

number of documents
log
number of documents in which j appears
TF-IDF weighting

• To compute the value of bin j in image I:

term frequency of j in I x inverse document frequency of j

Inverted file
• Each image has ~1,000 features
• We have ~1,000,000 visual words
→each histogram is extremely sparse (mostly zeros)

• Inverted file
– mapping from words to documents
Inverted file
• Can quickly use the inverted file to compute
similarity between a new image and all the
images in the database
– Only consider database images whose bins
overlap the query image
Spatial pyramid: BoW disregards all information
about the spatial layout of the features

Compute histogram in each spatial bin

Slide credit: D. Hoiem
Spatial pyramid

[Lazebnik et al. CVPR 2006]

Slide credit: D. Hoiem

Bag-Of-Words Models: Noah Snavely
No ratings yet
Bag-Of-Words Models: Noah Snavely
47 pages
Local Features and Bag of Words Models
No ratings yet
Local Features and Bag of Words Models
60 pages
CV 2025 Spring 12 Short
No ratings yet
CV 2025 Spring 12 Short
120 pages
Bag of Features
No ratings yet
Bag of Features
49 pages
Image Classification AI
No ratings yet
Image Classification AI
150 pages
IT5409 - Ch7 - Part2 - Object Recognition - v2 - 4pages
No ratings yet
IT5409 - Ch7 - Part2 - Object Recognition - v2 - 4pages
38 pages
Lab6 1
No ratings yet
Lab6 1
6 pages
Visual Feature Extraction - Bag of Words
No ratings yet
Visual Feature Extraction - Bag of Words
17 pages
Understanding Bag-Of-Words Model A Statistical Fra
No ratings yet
Understanding Bag-Of-Words Model A Statistical Fra
16 pages
Image Classification Using Bag of Visual Words (Bovw) : 10.22401/anjs.21.4.11
No ratings yet
Image Classification Using Bag of Visual Words (Bovw) : 10.22401/anjs.21.4.11
7 pages
PROJECT Presentation Medical Multimodal Image Retrieval
No ratings yet
PROJECT Presentation Medical Multimodal Image Retrieval
57 pages
CV Lecture 07 BagOfFeatures
No ratings yet
CV Lecture 07 BagOfFeatures
42 pages
Introduction To Object Recognition: Slides Adapted From Fei-Fei Li, Rob Fergus, Antonio Torralba, and Others
No ratings yet
Introduction To Object Recognition: Slides Adapted From Fei-Fei Li, Rob Fergus, Antonio Torralba, and Others
60 pages
Lecture10-Featurebased Image Matching
No ratings yet
Lecture10-Featurebased Image Matching
33 pages
Bai09 Descriptors
No ratings yet
Bai09 Descriptors
81 pages
Understanding Bag-Of-Words Model: A Statistical Framework
No ratings yet
Understanding Bag-Of-Words Model: A Statistical Framework
10 pages
Computer Vision Presentation
No ratings yet
Computer Vision Presentation
19 pages
Ijaia 040305
No ratings yet
Ijaia 040305
10 pages
Lecture 06
No ratings yet
Lecture 06
72 pages
Lab4 103169894
No ratings yet
Lab4 103169894
34 pages
03-3 Feature Descriptors
No ratings yet
03-3 Feature Descriptors
58 pages
Fusion of Demands in Review of Bag-Of-Visual Words: Silkesha Thigale, A.B Bagwan
No ratings yet
Fusion of Demands in Review of Bag-Of-Visual Words: Silkesha Thigale, A.B Bagwan
4 pages
Bag of Feature
No ratings yet
Bag of Feature
75 pages
Bag of Words: The Framework
No ratings yet
Bag of Words: The Framework
44 pages
Feature Descriptors
No ratings yet
Feature Descriptors
13 pages
Feature Propagation On Image Webs For Enhanced Image Retrieval
No ratings yet
Feature Propagation On Image Webs For Enhanced Image Retrieval
8 pages
2 Bow
No ratings yet
2 Bow
59 pages
Bag of Words
No ratings yet
Bag of Words
72 pages
Image Searches, Abstraction, Invariance: 36-350: Data Mining 2 September 2009
No ratings yet
Image Searches, Abstraction, Invariance: 36-350: Data Mining 2 September 2009
27 pages
Course Material For cs391
No ratings yet
Course Material For cs391
21 pages
Evaluating Bag-Of-Visual-Words Representations in Scene Classific
No ratings yet
Evaluating Bag-Of-Visual-Words Representations in Scene Classific
11 pages
Image Matching: - Alok Talekar - Sairam Sundaresan
No ratings yet
Image Matching: - Alok Talekar - Sairam Sundaresan
70 pages
Advanced Image Search Techniques
No ratings yet
Advanced Image Search Techniques
4 pages
Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 3: Document Topic Modeling
No ratings yet
Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 3: Document Topic Modeling
48 pages
Writer Recognition by Computer Vision: Jeffrey P. Woodard Christopher P. Saunders Mark J. Lancaster
No ratings yet
Writer Recognition by Computer Vision: Jeffrey P. Woodard Christopher P. Saunders Mark J. Lancaster
19 pages
Content Based Image Retrieval Using Feature Coding
No ratings yet
Content Based Image Retrieval Using Feature Coding
4 pages
Semantics-Preserving Bag-of-Words Models and Applications
No ratings yet
Semantics-Preserving Bag-of-Words Models and Applications
13 pages
Lec23 Categorization Wide
No ratings yet
Lec23 Categorization Wide
53 pages
Image Features and Categorization: Computer Vision Jia-Bin Huang, Virginia Tech
No ratings yet
Image Features and Categorization: Computer Vision Jia-Bin Huang, Virginia Tech
70 pages
Bag of Visual Words Its Detectors and de
No ratings yet
Bag of Visual Words Its Detectors and de
13 pages
An Overviewof Bagof Words Importance Implementation Applicationsand Challenges
No ratings yet
An Overviewof Bagof Words Importance Implementation Applicationsand Challenges
6 pages
Part 11 MD
No ratings yet
Part 11 MD
53 pages
Image Descriptor Matching: Vineeth N Balasubramanian
No ratings yet
Image Descriptor Matching: Vineeth N Balasubramanian
20 pages
Feature Engineering Guide
100% (2)
Feature Engineering Guide
44 pages
Lecture - 7 MSDS
No ratings yet
Lecture - 7 MSDS
32 pages
A Mixed Generative-Discriminative Based Hashing Method: Qi Zhang, Yang Wang, Jin Qian, Binbin Deng, Xuanjing Huang
No ratings yet
A Mixed Generative-Discriminative Based Hashing Method: Qi Zhang, Yang Wang, Jin Qian, Binbin Deng, Xuanjing Huang
2 pages
An Integrated Approach For Image Retrieval Based On Content
No ratings yet
An Integrated Approach For Image Retrieval Based On Content
6 pages
Vocabulary Tree for Image Recognition
No ratings yet
Vocabulary Tree for Image Recognition
30 pages
Content Based Image Search
No ratings yet
Content Based Image Search
41 pages
Image Classification:: Feature-Based Methods
No ratings yet
Image Classification:: Feature-Based Methods
19 pages
Lab 5
No ratings yet
Lab 5
27 pages
Attribute Discovery Via Predictable Discriminative Binary Codes
No ratings yet
Attribute Discovery Via Predictable Discriminative Binary Codes
14 pages
Human Computation and Computer Vision
No ratings yet
Human Computation and Computer Vision
50 pages
Feature Engineering in ML & NLP
No ratings yet
Feature Engineering in ML & NLP
85 pages
Ch4 Word Embeddings
No ratings yet
Ch4 Word Embeddings
21 pages
Data Representation and Pattern Recognition in Image Mining-N D Thokare
No ratings yet
Data Representation and Pattern Recognition in Image Mining-N D Thokare
6 pages
Currency Recognition On Mobile Phones Proposed System Modules
No ratings yet
Currency Recognition On Mobile Phones Proposed System Modules
26 pages
Spatial Feat Embedding
No ratings yet
Spatial Feat Embedding
4 pages
Lecture5 2
No ratings yet
Lecture5 2
26 pages
Lecture3 1
No ratings yet
Lecture3 1
8 pages
Lecture2 2
No ratings yet
Lecture2 2
9 pages
Lecture2 1
No ratings yet
Lecture2 1
32 pages
Lecture32 K-Means Clustering Exercise
No ratings yet
Lecture32 K-Means Clustering Exercise
2 pages
Machine Learning Lab Manual 2023
No ratings yet
Machine Learning Lab Manual 2023
41 pages
Metasearch Clustering Algorithm
No ratings yet
Metasearch Clustering Algorithm
7 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
21 pages
Extraai
No ratings yet
Extraai
11 pages
SJNanda - Spider and CollidingBodies
No ratings yet
SJNanda - Spider and CollidingBodies
50 pages
MaskDiffusion Exploiting Pre-Trained Diffusion
No ratings yet
MaskDiffusion Exploiting Pre-Trained Diffusion
19 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
29 pages
Literature
No ratings yet
Literature
22 pages
Kmeans Worksheet
No ratings yet
Kmeans Worksheet
6 pages
Two Marks - Aiml
No ratings yet
Two Marks - Aiml
21 pages
Mar 13 Lae 08
No ratings yet
Mar 13 Lae 08
656 pages
Machine Learning in Microbiology
No ratings yet
Machine Learning in Microbiology
15 pages
Final Report Data Mining
No ratings yet
Final Report Data Mining
17 pages
Modeling & Simulation Conference 2017
No ratings yet
Modeling & Simulation Conference 2017
235 pages
Module2 Ids 240201 162026
No ratings yet
Module2 Ids 240201 162026
11 pages
Image Analytics
No ratings yet
Image Analytics
93 pages
Fam Question Bank CT
No ratings yet
Fam Question Bank CT
14 pages
(English (Auto-Generated) ) All Machine Learning Algorithms Explained in 17 Min (DownSub - Com)
No ratings yet
(English (Auto-Generated) ) All Machine Learning Algorithms Explained in 17 Min (DownSub - Com)
19 pages
(Ebook PDF) Introduction To Data Mining, Global Edition 2nd Edition PDF Download
100% (2)
(Ebook PDF) Introduction To Data Mining, Global Edition 2nd Edition PDF Download
49 pages
Unit 5 Clustering
No ratings yet
Unit 5 Clustering
70 pages
AI in Petroleum Engineering
No ratings yet
AI in Petroleum Engineering
104 pages
Exercices Kernel Trick
No ratings yet
Exercices Kernel Trick
24 pages
Video Summarization Techniques and Applications
No ratings yet
Video Summarization Techniques and Applications
6 pages
Building K-Means Clustering Algorithm From Scratch
No ratings yet
Building K-Means Clustering Algorithm From Scratch
10 pages
Ai Essentials Syllabus
No ratings yet
Ai Essentials Syllabus
16 pages
GCD Detailed Syllabus
No ratings yet
GCD Detailed Syllabus
24 pages
Weather Patterns Analysis and Predictons
No ratings yet
Weather Patterns Analysis and Predictons
13 pages
21EC744
No ratings yet
21EC744
6 pages
AI&DS Module 1 KTU
No ratings yet
AI&DS Module 1 KTU
29 pages

Lecture6 2

Uploaded by

Lecture6 2

Uploaded by

Bag-of-Words

Object Bag of ‘words’

• 250,000 images → ~ 31 billion image pairs

• 1,000,000 images → 500 billion pairs

• For city-sized datasets, fewer than 0.1% of

• Key idea: only consider likely matches

US Presidential Speeches Tag Cloud

US Presidential Speeches Tag Cloud

US Presidential Speeches Tag Cloud

John likes to watch movies. Mary likes too.

John also likes to watch football games

{"John": 1, "likes": 2, "to": 3, "watch": 4, "movies": 5,

face, flowers, building

Works pretty well image retrieval, recognition and matching

Quantize: approximate by one whose amplitude is

Slide credit: Josef Sivic

Slide credit: Josef Sivic

Slide credit: Josef Sivic

• The codebook is used for quantizing features

Fei-Fei et al. 2005

Sivic et al. 2005

the, and, or vs. cow, AT&T, Cher

• the bigger fraction of the documents a word

• inverse document frequency (IDF) of word j =

• To compute the value of bin j in image I:

term frequency of j in I x inverse document frequency of j

Compute histogram in each spatial bin

[Lazebnik et al. CVPR 2006]

You might also like