0% found this document useful (0 votes)

98 views50 pages

Pivot Excel

The document discusses how Shazam works by using machine learning techniques. It explains that Shazam uses audio fingerprinting to identify songs by extracting features from short audio clips and matching them to a large database of songs. Key steps include computing spectrograms, identifying peaks, finding peak pairs to create hashes, and storing hashes in buckets to enable fast searching through millions of songs. The document also provides an introduction to machine learning concepts like supervised learning, classification, and nearest neighbor algorithms.

Uploaded by

priyosantosa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

98 views50 pages

Pivot Excel

Uploaded by

priyosantosa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

How Shazam Works

+
Intro to machine learning for
multimedia analysis
Perception & Multimedia Computing
Lecture 19

Rebecca Fiebrink
Lecturer, Department of Computing
Goldsmiths, University of London
1
Last time
•  Perceptually-motivated features for IR
of audio and music
•  Evaluation of IR systems

2
Today
• Shazam
• Machine learning intro
•  Supervised learning introduction
•  Wekinator demo

3
Shazam

4
Using Shazam
1.  Open app
2.  Record a few
seconds of audio
3.  Find out
immediately
which song you’re
listening to (and
have the
opportunity to
buy it)
5
Fingerprinting
Goal: find a function f(song) such that
f(song1) = f(song2)
IF AND ONLY IF
song1 = song2
(*** if song1 and song2 are same
recording, but possibly with diﬀerent
compression, background noise, etc.)
6
Challenge:

Requires ability to quickly find

matching song out of a database of
millions!
Can’t just compare f(song) to
f(song1), f(song2), … f(songN) for all N
songs in the database.
Solution: use hashing
7
Typical hash functions in
computing
Hash Table:
Each data item
mapped onto
fixed-length (often
shorter) key
Facilitates rapid
search (only search
among collisions)
8
http://en.wikipedia.org/wiki/Hash_table
Review: Perceptual hash
•  Hash is a function computed on a song:
h(song)
•  Perceptually identical (or possibly “very
similar”) items will end up in same “bucket”
e.g., MP3, WAV versions of same song
•  Bucket is a specific location in memory /
on disk
•  When receive a new query, compute
h(query) and look in this “bucket” for possible
matches
9
Creating a Shazam hash function
Requirements:
1. Ability to use a short snippet of a song as a query
(rather than having to use whole song as query)
•  Hash should be computable from just a small part of
a song, and shouldn’t require knowledge of where in
the song the query lies
2. Robust to degradation (compression, background
noise, etc.)
•  Hash values for clean DB version of song should
exactly match hash values on degraded versions
3. Hash should be “high entropy”: don’t map too many
songs to the same bucket! (This makes search take
longer)
10
High-level solution
•  For each song in the DB, compute many hash values
using slightly diﬀerent local segments of the song.
•  Create on disk a series of “buckets” (hash values), each
of which contains a list of songs containing hashes that
fall into that bucket
•  For each song, also store a list of hashes and the song
times associated with them
•  For a query, compute several hash values using slightly
diﬀerent local segments
•  For each hash value, go to the corresponding bucket.
•  For all songs in the bucket, determine whether the
query matches hat song. (*** some magic here)

11
Hash Function
1. Compute the spectrogram

12
Hash Function
2. Identify peaks in spectrogram

Peak = a time-frequency point with higher energy than its

neighbors in a small region
13
Pretty robust to encoding scheme, background noise, EQ
Hash Function
3. Find pairs of nearby peaks.
Hash value = (peak1_freq, peak2_freq, time2-time1)

Can concatenate each hash into a 32-bit uint

14
Storing hashes
4. Store within each hash bucket:
(song number, time point of peak1)

e.g., contents of bucket “3235293”:

3235293:
(32572, 39280)
(209371, 94830)
(32572, 3927)

3235294:
(2324, 2323)

15
Receiving a query
Repeat hash process for query:
1.  Compute spectral peaks
2.  Find pairs of nearby peaks
3.  Compute list of hashes: (freq1, freq2, time2-time1)
4.  For each hash value:
a. Find hash bucket.
b. For each song in bucket:
Check if query matches song

Want this to be VERY FAST!

16
Checking for a match
Does query match song?

Create two lists:

1. List of (hash values, times) for song
2. List of (hash values, times) for query

Assumption: If query is a snippet of the song, then

1. They’ll share lots of hash values
And
2. The relative timing of diﬀerent hashes will be
the same!

17
Relative hash timings
If query matches song, we expect to see something like
this:

Song Hash (value, time) Query hash (value, time)

235, 4 23958, 5
23958, 10 9384, 17
9384, 23 39582, 23
39582, 28
273, 35

18
Query begins
40 seconds
into song

19
20
Results
Robust to noise:
Only 1-2% of hash values
must survive in order to enable
identification

21
22
23
Results
Very fast:
On a regular PC, search through
20,000 tracks takes 5-500ms

Very robust: few false positives

24
For more info
Read the academic paper with
details:
http://www.ee.columbia.edu/
~dpwe/papers/Wang03-
shazam.pdf

25
Intro to machine
learning

26
Feature values give a data item (e.g., song)
a point in feature space
Feature 2: Average centroid

Feature 1: Average RMS

27 27
Retrieval: Find items closest to a query
Feature 2: Average centroid

Feature 1: Average RMS

28 28
Classification: Assigning a data point a label
based on its position in feature space

David Bowie
songs
Feature 2: Average centroid

Lady Gaga songs

Feature 1: Average RMS
29 29
You be the classifier: Bowie or Gaga?
Feature 2: Average centroid

Feature 1: Lady Gaga songs

30 30
You be the classifier: Bowie or Gaga?
Feature 2: Average centroid

Feature 1: Lady Gaga songs

31 31
K-nearest-neighbor (kNN) classifier:
Take a vote of k nearest points

K=3
Feature 2: Average centroid

Feature 1: Lady Gaga songs

32 32
Supervised learning builds models.
Models are functions.

inputs

training
data

algorithm model

Training
outputs
33

33
Supervised learning algorithms
build models from data.
Each example represented as a feature vector

.01, .59, .03, 32 .05, 1.2, 3.2, 31 -.1, .34, .20, 8.2 .01, .64, .02, 20

inputs
“C Major” “F minor” “G7”
training
data

algorithm model

Training
Running “Coutputs
Major”
34
Classification: Assign 1 of N discrete labels
to each point in feature space
feature2

This model: a
separating
line or
hyperplane
(decision
boundary)
feature1
35
Regression
output

This model: a
real-valued
function of the
input features

feature
36
Unsupervised learning
Dataset includes examples, but no labels
Example: Infer clusters from data:
feature2

37
feature1
How supervised learning
algorithms work
(the basics)

38
The learning problem

Goal: Build the best** model given the training

data
Definition of “best” depends on context,
assumptions…

39
Which classifier is best?

“Underfit” “Overfit”

Competing goals:
Accurately model training data
**Accurately classify unseen data points**

40
Image from Andrew Ng
A simple classifier: nearest neighbor

feature2 feature1
?

41
Another simple classifier: Decision tree

Images: http://ai.cs.umbc.edu/~oates/classes/2009/ML/homework1.html,
42 http://nghiaho.com/?p=1300
AdaBoost: Iteratively train a “weak” learner
Image from http://www.cc.gatech.edu/~kihwan23/
imageCV/Final2005/FinalProject_KH.htm

43
Support vector machine

Re-map input space into a higher number of

dimensions and find a separating hyperplane

44
Supervised learning and music

Create classifiers for:

•  Genre (rock, pop, …)
•  Artist, album
•  Pitch (apply frame by frame in audio)
•  Mood
•  Tag/semantic descriptor (e.g., last.fm tags)
•  Instrument identification
•  Beatbox vocalization identification &
remapping
•  … and many others
45
Practical tips

May have to explore many feature sets

Normalize features! (e.g., all between [0,1])
Using too many features can hurt
performance: “curse of dimensionality”

May have to explore many learning algorithms

(and many parameter settings for each)
Compare using “test accuracy” or cross-
validation, NOT performance on training data

Usually more examples is better

46
Choosing a classifier: Practical considerations

k-Nearest Neighbor
+ Can tune k to adjust smoothness of decision boundaries
- Sensitive to noisy, redundant, irrelevant features; prone to overfitting;
weird in high dimensions
Decision tree:
+ Can prune to reduce overfitting, produces human-understandable
model
- Can still overfit
AdaBoost
+ Theoretical benefits, less prone to overfitting
+ Can tune by changing base learner, number of training rounds
Support Vector Machine
+ Theoretical benefits similar to AdaBoost
Many parameters to tune, training can take a long time
Other considerations…
47
For more info
Wekinator:
http://wekinator.cs.princeton.edu/

For non-realtime ML: Weka –

http://www.cs.waikato.ac.nz/ml/weka/

A good intro book on machine learning:

Data Mining by
Witten, Frank, and Hall

48
More detailed and technical textbooks

Bishop, 2006: Pattern Recognition & Machine

Learning. Science and Business Media, Springer
Duda, 2001: Pattern Classification, Wiley-
Interscience

49
Next week
Visualization

Audio-Based Music Classification
100% (1)
Audio-Based Music Classification
47 pages
Music Genre AI for Streaming Services
No ratings yet
Music Genre AI for Streaming Services
6 pages
Music Genre AI Classification
No ratings yet
Music Genre AI Classification
5 pages
Music Genre Classification
No ratings yet
Music Genre Classification
33 pages
Music Genre Classification Slides
No ratings yet
Music Genre Classification Slides
15 pages
Lab 6 - Shazam Part II
No ratings yet
Lab 6 - Shazam Part II
5 pages
Musicgenre-Pages Merged
No ratings yet
Musicgenre-Pages Merged
12 pages
Music Genre Classification Using AI
No ratings yet
Music Genre Classification Using AI
57 pages
Major
No ratings yet
Major
15 pages
Audio Fingerprinting With Python and Numpy
No ratings yet
Audio Fingerprinting With Python and Numpy
13 pages
Music Genre Classification: John Cast, Chris Schulze, Ali Fauci
No ratings yet
Music Genre Classification: John Cast, Chris Schulze, Ali Fauci
5 pages
Music Genre Classification With ResNet and
No ratings yet
Music Genre Classification With ResNet and
17 pages
Introduction To Music and ML
No ratings yet
Introduction To Music and ML
13 pages
Muzic Genre Classification
No ratings yet
Muzic Genre Classification
4 pages
Chapter 1: Introduction: 1.1 Overview
No ratings yet
Chapter 1: Introduction: 1.1 Overview
8 pages
Music Genre Classification Using Machine Learning: Prajwal R, Shubham Sharma, Prasanna Naik, Mrs. Sugna MK
No ratings yet
Music Genre Classification Using Machine Learning: Prajwal R, Shubham Sharma, Prasanna Naik, Mrs. Sugna MK
5 pages
Shazam Princeton ELE201
No ratings yet
Shazam Princeton ELE201
7 pages
Insights On Song Genres With PCA Analysis of Spectrograms
No ratings yet
Insights On Song Genres With PCA Analysis of Spectrograms
20 pages
ISMIR 2019 Tutorial - Waveform-Based Music Processing With Deep Learning
No ratings yet
ISMIR 2019 Tutorial - Waveform-Based Music Processing With Deep Learning
152 pages
Irjet Music Information Retrieval and Ge
No ratings yet
Irjet Music Information Retrieval and Ge
8 pages
Block Chain
No ratings yet
Block Chain
14 pages
Song Worm (AI Playlists Made For You)
No ratings yet
Song Worm (AI Playlists Made For You)
50 pages
Artboard 7
No ratings yet
Artboard 7
1 page
What Is Shazam?
No ratings yet
What Is Shazam?
22 pages
Audio Fingerprinting Sls 24oct2011
No ratings yet
Audio Fingerprinting Sls 24oct2011
44 pages
Automatic Genre Classification of Music Content: (A Survey)
No ratings yet
Automatic Genre Classification of Music Content: (A Survey)
28 pages
Speech Chapter 4
No ratings yet
Speech Chapter 4
41 pages
Rachel Mellon, Dan Spaeth, Eric Theis, Genre Classification Using Graph Representations of Music
No ratings yet
Rachel Mellon, Dan Spaeth, Eric Theis, Genre Classification Using Graph Representations of Music
5 pages
DM Final Report
No ratings yet
DM Final Report
4 pages
Cs6601 Project 2 Paper
No ratings yet
Cs6601 Project 2 Paper
4 pages
Final
No ratings yet
Final
14 pages
Audio Data Analysis Using Machine Learning and Deep
No ratings yet
Audio Data Analysis Using Machine Learning and Deep
74 pages
Music Genre Classification Project Repor
No ratings yet
Music Genre Classification Project Repor
19 pages
Cs 230 Final Project Paper
No ratings yet
Cs 230 Final Project Paper
6 pages
Music Genre Classification with CNN
No ratings yet
Music Genre Classification with CNN
5 pages
Audio File Recognition Using Hash Algorithm
No ratings yet
Audio File Recognition Using Hash Algorithm
8 pages
BertinMahieux Columbia 0054D 11154
No ratings yet
BertinMahieux Columbia 0054D 11154
126 pages
5 Sgasgs
No ratings yet
5 Sgasgs
6 pages
Jios 1431
No ratings yet
Jios 1431
18 pages
Chord Detection Using Deep Learning
No ratings yet
Chord Detection Using Deep Learning
7 pages
Audio Deep Learning Made Simple (Part 3) - Data Preparation and Augmentation - Towards Data Science
No ratings yet
Audio Deep Learning Made Simple (Part 3) - Data Preparation and Augmentation - Towards Data Science
15 pages
Zhang16h Interspeech
No ratings yet
Zhang16h Interspeech
5 pages
Music Data
No ratings yet
Music Data
30 pages
Predicting Singer Voice Using Convolutional Neural Network
No ratings yet
Predicting Singer Voice Using Convolutional Neural Network
17 pages
Compose Compute - Computer Generation and Classification of Music Through Operations Research Methods
No ratings yet
Compose Compute - Computer Generation and Classification of Music Through Operations Research Methods
250 pages
Multilabel Genre Classification
No ratings yet
Multilabel Genre Classification
17 pages
Slides
No ratings yet
Slides
30 pages
Ph.D. Thesis Computationally Efficient Methods For Polyphonic Music Transcription
No ratings yet
Ph.D. Thesis Computationally Efficient Methods For Polyphonic Music Transcription
232 pages
Pert Usa PHD
No ratings yet
Pert Usa PHD
232 pages
Cheng 2020
No ratings yet
Cheng 2020
5 pages
Seminar Report - 3sem
No ratings yet
Seminar Report - 3sem
34 pages
Classification and Popularity Assessment of English Songs Based On Audio Features
No ratings yet
Classification and Popularity Assessment of English Songs Based On Audio Features
3 pages
Musical Genre Classification Using Advanced Audio Analysis and Deep Learning Techniques
No ratings yet
Musical Genre Classification Using Advanced Audio Analysis and Deep Learning Techniques
11 pages
Paper 4-Enhancing Audio Classification Through MFCC
No ratings yet
Paper 4-Enhancing Audio Classification Through MFCC
17 pages
Nietjet 0602S 2018 003
No ratings yet
Nietjet 0602S 2018 003
5 pages
Audio Chord Recognition With Recurrent Neural Networks
No ratings yet
Audio Chord Recognition With Recurrent Neural Networks
6 pages
Music Database Retrieval Based On Spectral Similarity.
No ratings yet
Music Database Retrieval Based On Spectral Similarity.
9 pages
Automatic Musical Instrument
No ratings yet
Automatic Musical Instrument
1 page
Wall E List of Parts
No ratings yet
Wall E List of Parts
2 pages
Deep Learning Course Syllabus 2019
No ratings yet
Deep Learning Course Syllabus 2019
5 pages
CVAVR Getting Started With Atmel Studio PDF
No ratings yet
CVAVR Getting Started With Atmel Studio PDF
31 pages
Emmet CheatSheet
100% (1)
Emmet CheatSheet
24 pages
Readme PDF
No ratings yet
Readme PDF
9 pages
Flash NodeMCU V3 Stock Firmware Guide
100% (1)
Flash NodeMCU V3 Stock Firmware Guide
4 pages
NMEA Parser Design
No ratings yet
NMEA Parser Design
4 pages
Install ESP8266 Core For Arduino IDE
No ratings yet
Install ESP8266 Core For Arduino IDE
3 pages
Pengendalian Electronic Home Appliances Berbasis IP Dengan Mengunakan Modul Wiznet NM7010A
No ratings yet
Pengendalian Electronic Home Appliances Berbasis IP Dengan Mengunakan Modul Wiznet NM7010A
24 pages
DP83848C PDF
No ratings yet
DP83848C PDF
84 pages
Et STM32F103
100% (1)
Et STM32F103
25 pages
Consider Reading
No ratings yet
Consider Reading
14 pages
3512C 1500 KW Performance Data
No ratings yet
3512C 1500 KW Performance Data
10 pages
EEPR1110 Cat Introds EMCP 4 Controller - 121410
No ratings yet
EEPR1110 Cat Introds EMCP 4 Controller - 121410
2 pages
Creating Quality UIs With NI LabVIEW
No ratings yet
Creating Quality UIs With NI LabVIEW
39 pages
Electrical Circuits
No ratings yet
Electrical Circuits
30 pages
Innovative Microalgae Pigments As Functional Ingredients in Nutrition
No ratings yet
Innovative Microalgae Pigments As Functional Ingredients in Nutrition
11 pages
Research Title
No ratings yet
Research Title
32 pages
19BBL019 Internship Report
No ratings yet
19BBL019 Internship Report
19 pages
Microbiology and Parasitology Chapter 3 5
No ratings yet
Microbiology and Parasitology Chapter 3 5
30 pages
Cadmach Tablet Press Replacement Parts Catalog
No ratings yet
Cadmach Tablet Press Replacement Parts Catalog
12 pages
Hindalco - Supply Chain Framework - MBA Operations Summer Training Project Report PDF
100% (1)
Hindalco - Supply Chain Framework - MBA Operations Summer Training Project Report PDF
120 pages
Efficient Shorthand Techniques Guide
No ratings yet
Efficient Shorthand Techniques Guide
2 pages
Smartoffice:: An Intelligent and Interactive Environment
No ratings yet
Smartoffice:: An Intelligent and Interactive Environment
2 pages
Tan Delta Test - Loss Angle Test - Dissipation Factor Test - Electrical4U PDF
No ratings yet
Tan Delta Test - Loss Angle Test - Dissipation Factor Test - Electrical4U PDF
10 pages
NPCT42x Trusted Platform Module (TPM) : General Description
No ratings yet
NPCT42x Trusted Platform Module (TPM) : General Description
25 pages
ECPM2016 ABSTRACTBOOKTheJournalofMaternal-FetalNeonatalMedicine PDF
No ratings yet
ECPM2016 ABSTRACTBOOKTheJournalofMaternal-FetalNeonatalMedicine PDF
314 pages
List of Pharmaceuticals in Lahore
78% (18)
List of Pharmaceuticals in Lahore
3 pages
Mototrbo SL Series Sl1K Portables: Detailed Service Manual
No ratings yet
Mototrbo SL Series Sl1K Portables: Detailed Service Manual
80 pages
Writing Rubric Tok 3RD Term
No ratings yet
Writing Rubric Tok 3RD Term
1 page
The Blue Collar Theoretically - John F Lavelle
No ratings yet
The Blue Collar Theoretically - John F Lavelle
288 pages
Sustainable Bamboo for Urban Growth
No ratings yet
Sustainable Bamboo for Urban Growth
7 pages
Gri 2 General Disclosures 2021
No ratings yet
Gri 2 General Disclosures 2021
59 pages
Assignment #6 - Seneca Mystery Shopper Report
No ratings yet
Assignment #6 - Seneca Mystery Shopper Report
3 pages
A2L File
100% (2)
A2L File
5 pages
Data Analysis and Interpretation Guide
No ratings yet
Data Analysis and Interpretation Guide
4 pages
Feed-Forward Neural Networks (Part 2: Learning)
No ratings yet
Feed-Forward Neural Networks (Part 2: Learning)
17 pages
Elmex Connector Datasheet, GA Drawing and Test Certification Approval
No ratings yet
Elmex Connector Datasheet, GA Drawing and Test Certification Approval
113 pages
Equipment List Including Long Lead Items
No ratings yet
Equipment List Including Long Lead Items
6 pages
Pacemaker Overview for Nursing Students
No ratings yet
Pacemaker Overview for Nursing Students
9 pages
MATLAB Basics for Engineers
No ratings yet
MATLAB Basics for Engineers
12 pages
Technological Influence in Education by AKANDWANAHO JOHNBAPTIST.
No ratings yet
Technological Influence in Education by AKANDWANAHO JOHNBAPTIST.
25 pages
Costing Methods & Techniques - 1st Chapter
No ratings yet
Costing Methods & Techniques - 1st Chapter
8 pages
The Dolphin: Analysis and Reading Comprehension
No ratings yet
The Dolphin: Analysis and Reading Comprehension
2 pages
UNIFE 2016 Rail Market Study Highlights
No ratings yet
UNIFE 2016 Rail Market Study Highlights
21 pages

Pivot Excel

Uploaded by

Pivot Excel

Uploaded by

How Shazam Works

Requires ability to quickly find

Peak = a time-frequency point with higher energy than its

Can concatenate each hash into a 32-bit uint

e.g., contents of bucket “3235293”:

Want this to be VERY FAST!

Create two lists:

Assumption: If query is a snippet of the song, then

Song Hash (value, time) Query hash (value, time)

Very robust: few false positives

Feature 1: Average RMS

Feature 1: Average RMS

Lady Gaga songs

Feature 1: Lady Gaga songs

Feature 1: Lady Gaga songs

Feature 1: Lady Gaga songs

Goal: Build the best** model given the training

Re-map input space into a higher number of

Create classifiers for:

May have to explore many feature sets

May have to explore many learning algorithms

Usually more examples is better

For non-realtime ML: Weka –

A good intro book on machine learning:

Bishop, 2006: Pattern Recognition & Machine

You might also like