05 Greedy Clustering

The document outlines a clustering algorithm for organizing photos based on similarity measures. It describes the steps to develop a formal problem specification, evaluate potential solutions, and design a better algorithm to minimize categorization costs. The goal is to group similar photos into non-overlapping categories while minimizing the similarity between categories.

Uploaded by

Sidd Plays

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views6 pages

05 Greedy Clustering

Uploaded by

Sidd Plays

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

∗

CPSC 320: Clustering (part 1)

You’re working on software to organize people’s photos. Your algorithm receives as input:

• A bunch of uncategorized photos.

• A similarity measure for each pair of photos, where a 0 similarity indicates two photos are nothing
like each other; a 1 indicates two photos are exactly the same. All other similarities are in between.

• The number of categories to group them into.

Your algorithm should create a categorization: the requested number of categories, where a category is a
non-empty set of photos. Every photo belongs to some category, and no photo belongs to more than one
category. So, a categorization is a partition. We’d like similar photos to be in the same category.

Step 1: Build intuition through examples.

1. Write down small and trivial instances of the problem. What data structure is useful to represent a
problem instance? Write down also potential solutions for your instances. Are some solutions better
than others? How so?

∗
Copyright Notice: UBC retains the rights to this document. You may not distribute this document without permission.

1
Step 2: Develop a formal problem specification
1. Develop notation for describing a problem instance.

2. Use your notation to flesh out the following group of photos into an instance.

2
3. Develop notation for describing a potential solution. Describe what you think makes a solution good.
Can you come up with a reasonable criterion for deciding if one solution is better than another?

3
4. From here on, we’ll all use the same definition of “good solution”.
First, we define the similarity between two categories C1 and C2 to be the maximum similarity between
any pair of photos p1 , p2 such that p1 ∈ C1 and p2 ∈ C2 .
Then, the cost of a categorization is the maximum similarity between any two of its categories. The
lower the cost, the better the categorization, since we we don’t want categories to be similar. So, we
want to find a solution with minimum cost. We’ll use the term “optimal solution” (rather than “good
solution”) to refer to solutions that have minimum cost.
Write down optimal solutions and their costs for your previous examples.

Step 3: Identify similar problems. What are the similarities?

4
Step 4: Evaluate brute force.
1. A potential solution is the set of partitions of n photos into c subsets (where c is the requested
number of categories). Suppose that c = 2. Roughly, how does the number of potential solutions
grow asymptotically with n? Polynomially? Exponentially?

2. Given a potential solution, how can you determine how good it is, i.e., what is its cost? Asymptotically,
how long will this take?

Step 5: Design a better algorithm.

There is a much better approach.

1. Find the edge in each of your instances with the highest similarity. Should the two photos incident
on that edge go in the same category? Prove a more general result.

5
2. Based on this insight, come up with algorithmic ideas for creating a categorization.

Modeling (1997) : Steven Skiena
No ratings yet
Modeling (1997) : Steven Skiena
15 pages
2 - Data Structures and Algorithms in Problem Solving
No ratings yet
2 - Data Structures and Algorithms in Problem Solving
8 pages
Fuzzy Classification
No ratings yet
Fuzzy Classification
56 pages
Fuzzy Classification Part I
No ratings yet
Fuzzy Classification Part I
47 pages
Uninformed Search 4 Uniform-Cost Search
No ratings yet
Uninformed Search 4 Uniform-Cost Search
3 pages
Chazelle
No ratings yet
Chazelle
61 pages
Unit-1 Ip
No ratings yet
Unit-1 Ip
15 pages
01 Data Structures
No ratings yet
01 Data Structures
28 pages
Leetcode Pro Sheet
No ratings yet
Leetcode Pro Sheet
48 pages
Unit 4 - Data Warehousing and Mining
No ratings yet
Unit 4 - Data Warehousing and Mining
51 pages
Region Growing Algorithm CV
No ratings yet
Region Growing Algorithm CV
11 pages
Ahemd's Answers
No ratings yet
Ahemd's Answers
17 pages
DAA 2marks With Answers
No ratings yet
DAA 2marks With Answers
11 pages
Segment 7 (Ch10)
No ratings yet
Segment 7 (Ch10)
60 pages
Lecture4 CSPs
No ratings yet
Lecture4 CSPs
56 pages
Dynamic Programming
No ratings yet
Dynamic Programming
46 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
30 pages
Cs221 LEC 3 Slides
No ratings yet
Cs221 LEC 3 Slides
43 pages
Cs188-Sp23-Lec07 CSP
No ratings yet
Cs188-Sp23-Lec07 CSP
37 pages
02 Algorithm
No ratings yet
02 Algorithm
22 pages
Week-7 - CSPs
No ratings yet
Week-7 - CSPs
50 pages
Ansh 1 Cs Pracitic
No ratings yet
Ansh 1 Cs Pracitic
92 pages
Lecture 4
No ratings yet
Lecture 4
14 pages
Clustering
No ratings yet
Clustering
28 pages
BSCM-00 Optimization Primer
No ratings yet
BSCM-00 Optimization Primer
7 pages
EE 5301 - VLSI Design Automation I: Algorithms
No ratings yet
EE 5301 - VLSI Design Automation I: Algorithms
128 pages
Cluster Analysis in Construction
No ratings yet
Cluster Analysis in Construction
23 pages
DP Presentation
No ratings yet
DP Presentation
62 pages
Algorithms Part 1 - Lecture Notes: 1 Union Find
No ratings yet
Algorithms Part 1 - Lecture Notes: 1 Union Find
6 pages
Lecture24 s12
No ratings yet
Lecture24 s12
24 pages
Unit 4: Features and Constraints: Topics
No ratings yet
Unit 4: Features and Constraints: Topics
10 pages
cs188 Fa23 Lec04
No ratings yet
cs188 Fa23 Lec04
37 pages
Java Course Notes CS3114 - 09212011
No ratings yet
Java Course Notes CS3114 - 09212011
344 pages
A3c5 PDF
No ratings yet
A3c5 PDF
335 pages
Mehlhorn K., Sanders P. Concise Algorithmics, The Basic Toolbox 124ñ PDF
No ratings yet
Mehlhorn K., Sanders P. Concise Algorithmics, The Basic Toolbox 124ñ PDF
124 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
24 pages
Practice Exam - Gradescope Ver.
No ratings yet
Practice Exam - Gradescope Ver.
19 pages
FDS Unit I
No ratings yet
FDS Unit I
126 pages
Computer Science Algorithms Guide
No ratings yet
Computer Science Algorithms Guide
39 pages
Module 5 - FSTA
No ratings yet
Module 5 - FSTA
32 pages
Class Design
No ratings yet
Class Design
17 pages
AIML CIA II Question Paper ECE Remedial Anskey
No ratings yet
AIML CIA II Question Paper ECE Remedial Anskey
33 pages
Python in Action
No ratings yet
Python in Action
259 pages
Section 3
No ratings yet
Section 3
73 pages
Midterm Exam Key
No ratings yet
Midterm Exam Key
7 pages
Internet of Things Comparative Study
No ratings yet
Internet of Things Comparative Study
3 pages
cs3230 Cheatsheet
No ratings yet
cs3230 Cheatsheet
6 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
50 pages
AI Study Guide for Class XII
No ratings yet
AI Study Guide for Class XII
40 pages
20CSI42 - DAA Lab Manual
No ratings yet
20CSI42 - DAA Lab Manual
21 pages
CV w4 - Recognition - Statistical Based
No ratings yet
CV w4 - Recognition - Statistical Based
42 pages
DP 1
No ratings yet
DP 1
67 pages
Ch6 - Constraint Satisfaction Problems
No ratings yet
Ch6 - Constraint Satisfaction Problems
36 pages
Introduction To Data Science: Tom A S Horv Ath
No ratings yet
Introduction To Data Science: Tom A S Horv Ath
39 pages
08 Data Mining-Other Classifications
No ratings yet
08 Data Mining-Other Classifications
4 pages
Unit 4-1
No ratings yet
Unit 4-1
69 pages
Clustering
No ratings yet
Clustering
34 pages
Topic 4 ML
100% (1)
Topic 4 ML
125 pages
01 CourseOverview WhatIsAI
No ratings yet
01 CourseOverview WhatIsAI
43 pages
06 Prune and Search
No ratings yet
06 Prune and Search
6 pages
04 IntroToSearch
No ratings yet
04 IntroToSearch
28 pages
02 SMP Reductions
No ratings yet
02 SMP Reductions
6 pages
CPSC 322: Introduction To Artificial Intelligence: Lecture 02: Representational Dimensions
No ratings yet
CPSC 322: Introduction To Artificial Intelligence: Lecture 02: Representational Dimensions
20 pages
04 Graphs
No ratings yet
04 Graphs
5 pages
01 Intro
No ratings yet
01 Intro
6 pages
Water Test Report 1-2-3-4-5-6-7
100% (1)
Water Test Report 1-2-3-4-5-6-7
2 pages
Spectre Operations 1
100% (1)
Spectre Operations 1
143 pages
Haar Wavelet Matrices For The Numerical Solutions of Differential Equations
No ratings yet
Haar Wavelet Matrices For The Numerical Solutions of Differential Equations
4 pages
Āryabha A Ganit Challenge 2022: Maximum Marks: 40 Duration: 1 Hour
100% (2)
Āryabha A Ganit Challenge 2022: Maximum Marks: 40 Duration: 1 Hour
9 pages
Nokia Dissertation
100% (1)
Nokia Dissertation
90 pages
MikroTik Price List-May 2023-01.05.2023
No ratings yet
MikroTik Price List-May 2023-01.05.2023
5 pages
Emca Labels 2024-04-23
No ratings yet
Emca Labels 2024-04-23
4 pages
333 Story
No ratings yet
333 Story
3 pages
Contrastive and Consistency Learning For Neural Noisy-Channel Model in Spoken Language Understanding
No ratings yet
Contrastive and Consistency Learning For Neural Noisy-Channel Model in Spoken Language Understanding
14 pages
Farm Management MCQs for ECON365
100% (2)
Farm Management MCQs for ECON365
20 pages
Distal Tubule Balance and Tubuloglomerular Feedback-Group 2
No ratings yet
Distal Tubule Balance and Tubuloglomerular Feedback-Group 2
42 pages
Molecular Biotechnology 6th Edition Bernard R. Glick Online PDF
No ratings yet
Molecular Biotechnology 6th Edition Bernard R. Glick Online PDF
119 pages
Assignment 2 Actuators
No ratings yet
Assignment 2 Actuators
3 pages
MSDS - York K Oil
No ratings yet
MSDS - York K Oil
4 pages
Link Belt LS 248H - II - 200T PDF
No ratings yet
Link Belt LS 248H - II - 200T PDF
62 pages
This Little Light of Mine 3 Grade Poem/Song: This Text Is in The Public Domain
No ratings yet
This Little Light of Mine 3 Grade Poem/Song: This Text Is in The Public Domain
2 pages
Qualitative Analysis of Cations
No ratings yet
Qualitative Analysis of Cations
23 pages
You Know Your Child Is Gifted When
100% (4)
You Know Your Child Is Gifted When
67 pages
Eligibility Conditions Faculty Appointment TTS
No ratings yet
Eligibility Conditions Faculty Appointment TTS
2 pages
The Effects of Watching Korean Dramas To The Academic Performance of Grade 11 GHT 2a Phinma University of Pangasinan
50% (4)
The Effects of Watching Korean Dramas To The Academic Performance of Grade 11 GHT 2a Phinma University of Pangasinan
6 pages
CBC Animal Health Care & Management NC III
No ratings yet
CBC Animal Health Care & Management NC III
66 pages
Sec2 Group1 Maru Batting Centre
No ratings yet
Sec2 Group1 Maru Batting Centre
9 pages
Assignment Group4
100% (1)
Assignment Group4
21 pages
C Transmission Unit Parts List
No ratings yet
C Transmission Unit Parts List
16 pages
The Great Gatsby: Jazz Age Critique
No ratings yet
The Great Gatsby: Jazz Age Critique
13 pages
10 Most Bizarre Festivals
No ratings yet
10 Most Bizarre Festivals
6 pages
Figures of Speech Lesson Plan
No ratings yet
Figures of Speech Lesson Plan
4 pages
RSETI Model of Entrepreneurship Development
No ratings yet
RSETI Model of Entrepreneurship Development
25 pages
Annual Exam Jan 2023 - Osbp - 25.03.2023
No ratings yet
Annual Exam Jan 2023 - Osbp - 25.03.2023
17 pages
E EL H0102 Elabscience MSDS
No ratings yet
E EL H0102 Elabscience MSDS
10 pages

05 Greedy Clustering

Uploaded by

05 Greedy Clustering

Uploaded by

∗

CPSC 320: Clustering (part 1)

• A bunch of uncategorized photos.

• The number of categories to group them into.

Step 1: Build intuition through examples.

Step 3: Identify similar problems. What are the similarities?

Step 5: Design a better algorithm.

You might also like