0% found this document useful (0 votes)

40 views3 pages

Vector Model Sum

The document outlines the process of ranking three documents (D1, D2, D3) based on their relevance to a given query using the vector model. It details steps including term frequency calculation, inverse document frequency computation, TF-IDF weight calculation, and cosine similarity measurement. Ultimately, the documents are ranked with D2 being the most relevant, followed by D1 and D3.

Uploaded by

diya.kharat001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views3 pages

Vector Model Sum

Uploaded by

diya.kharat001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Q Consider the following documents.

D1: I went to the park to play D2: park is nearby to play

D3: going to the park is fun Q: park nearby play
Apply the vector model to rank the above documents.

Ans:
Step 1: Represent Documents and Query by the index terms
• D1: I, went, to, the, park, to, play
• D2: park, is, nearby, to, play
• D3: going, to, the, park, is, fun
• Q: park, nearby, play

Step 2: Create the Vocabulary

Combine unique terms from all documents and the query.
Vocabulary:
{ I, went, to, the, park, play, is, nearby, going, fun }

Step 3: Construct Term Frequency (TF) Matrix

Count the frequency of each term from the vocabulary in each document and query.
Term D1 (fij) D2 (fij) D3 (fij) Q (fij)
i 1 0 0 0
went 1 0 0 0
to 2 1 1 0
the 1 0 1 0
park 1 1 1 1
play 1 1 0 1
is 0 1 1 0
nearby 0 1 0 1
going 0 0 1 0
fun 0 0 1 0

Step 4: Compute Inverse Document Frequency (IDF)

IDF is calculated as:

• N: Total number of documents (N=3).

• n: Number of documents containing the term.
Term ni (Documents containing term) IDF = log(3/ni)
i 1 log(3/1) = log(3) ≈ 0.477
went 1 log(3/1) = log(3) ≈ 0.477
to 3 log(3/3) = log(1) = 0.000
the 2 log(3/2) = log(1.5) ≈ 0.176
park 3 log(3/3) = log(1) = 0.000
play 2 log(3/2) = log(1.5) ≈ 0.176
is 2 log(3/2) = log(1.5) ≈ 0.176
nearby 1 log(3/1) = log(3) ≈ 0.477
going 1 log(3/1) = log(3) ≈ 0.477
fun 1 log(3/1) = log(3) ≈ 0.477

Step 5: Calculate TF-IDF Weight (wij):

Term D1 (wij) D2 (wij) D3 (wij) Q (wij)

(1 + log 1) * 0.477 =
i 1 * 0.477 = 0.477 0 0 0
(1 + log 1) * 0.477 =
went 1 * 0.477 = 0.477 0 0 0
(1 + log 2) * 0.000 = (1 + log 1) * 0.000 = (1 + log 1) * 0.000 =
to 1.301 * 0 = 0 1*0=0 1*0=0 0
(1 + log 1) * 0.176 = (1 + log 1) * 0.176 =
the 1 * 0.176 = 0.176 0 1 * 0.176 = 0.176 0
(1 + log 1) * 0.000 = (1 + log 1) * 0.000 = (1 + log 1) * 0.000 = (1 + log 1) * 0.000 =
park 1*0=0 1*0=0 1*0=0 1*0=0
(1 + log 1) * 0.176 = (1 + log 1) * 0.176 = (1 + log 1) * 0.176 =
play 1 * 0.176 = 0.176 1 * 0.176 = 0.176 0 1 * 0.176 = 0.176
(1 + log 1) * 0.176 = (1 + log 1) * 0.176 =
is 0 1 * 0.176 = 0.176 1 * 0.176 = 0.176 0
(1 + log 1) * 0.477 = (1 + log 1) * 0.477 =
nearby 0 1 * 0.477 = 0.477 0 1 * 0.477 = 0.477
(1 + log 1) * 0.477 =
going 0 0 1 * 0.477 = 0.477 0
(1 + log 1) * 0.477 =
fun 0 0 1 * 0.477 = 0.477 0
Step 6: Represent Documents and Query as Vectors
Using the TF-IDF weights, we form vectors for each document and the query, in the order of
our vocabulary: {i, went, to, the, park, play, is, nearby, going, fun}.
• D1 Vector: [0.477, 0.477, 0, 0.176, 0, 0.176, 0, 0, 0, 0]
• D2 Vector: [0, 0, 0, 0, 0, 0.176, 0.176, 0.477, 0, 0]
• D3 Vector: [0, 0, 0, 0.176, 0, 0, 0.176, 0, 0.477, 0.477]
• Q Vector: [0, 0, 0, 0, 0, 0.176, 0, 0.477, 0, 0]

Step 7: Calculate Cosine Similarity:

Cosine similarity is given by:

Calculate Dot Products (Q. D):

• Q. D1 = (0 * 0.477) + (0 * 0.477) + (0 * 0) + (0 * 0.176) + (0 * 0) + (0.176 * 0.176) +
(0 * 0) + (0.477 * 0) + (0 * 0) + (0 * 0) = 0.030976
• Q. D2 = (0 * 0) + (0 * 0) + (0 * 0) + (0 * 0) + (0 * 0) + (0.176 * 0.176) + (0 * 0.176) +
(0.477 * 0.477) + (0 * 0) + (0 * 0) = 0.030976 + 0.227529 = 0.258505
• Q. D3 = (0 * 0) + (0 * 0) + (0 * 0) + (0 * 0.176) + (0 * 0) + (0.176 * 0) + (0 * 0.176) +
(0.477 * 0) + (0 * 0.477) + (0 * 0.477) = 0

Calculate Magnitudes:
• ||D1|| = sqrt(0.477^2 + 0.477^2 + 0.176^2 + 0.176^2) = sqrt(0.227529 + 0.227529 +
0.030976 + 0.030976) = sqrt(0.51701) ≈ 0.7190
• ||D2|| = sqrt(0.176^2 + 0.176^2 + 0.477^2) = sqrt(0.030976 + 0.030976 + 0.227529) =
sqrt(0.289481) ≈ 0.5380
• ||D3|| = sqrt(0.176^2 + 0.176^2 + 0.477^2 + 0.477^2) = sqrt(0.030976 + 0.030976 +
0.227529 + 0.227529) = sqrt(0.51701) ≈ 0.7190
• ||Q|| = sqrt(0.176^2 + 0.477^2) = sqrt(0.030976 + 0.227529) = sqrt(0.258505) ≈ 0.5084

Calculate Cosine Similarities:

• Sim(Q, D1) = 0.030976 / (0.5084 * 0.7190) = 0.030976 / 0.3655316 ≈ 0.0847
• Sim(Q, D2) = 0.258505 / (0.5084 * 0.5380) = 0.258505 / 0.2735272 ≈ 0.9451
• Sim(Q, D3) = 0 / (0.5084 * 0.7190) = 0

Step 8: Rank Documents

Based on the cosine similarity scores (higher score means more similar):
1. D2 (Cosine Similarity ≈ 0.9451)
2. D1 (Cosine Similarity ≈ 0.0847)
3. D3 (Cosine Similarity = 0)

University Solved Questions-VSM - Compressed
No ratings yet
University Solved Questions-VSM - Compressed
5 pages
Learning Guide Unit 4 - Home
No ratings yet
Learning Guide Unit 4 - Home
14 pages
Question Bank (Problems)
No ratings yet
Question Bank (Problems)
6 pages
Document Ranking Using Customizes Vector Method
No ratings yet
Document Ranking Using Customizes Vector Method
6 pages
Vector Space Model
No ratings yet
Vector Space Model
6 pages
Chapter 6 - Exercises
No ratings yet
Chapter 6 - Exercises
5 pages
4 22865 CS475 2019 1 2 1 Exercises v5
No ratings yet
4 22865 CS475 2019 1 2 1 Exercises v5
14 pages
IR Solutions Combined
No ratings yet
IR Solutions Combined
82 pages
Vector Semantics - NLP
No ratings yet
Vector Semantics - NLP
118 pages
Vector Space Model
No ratings yet
Vector Space Model
10 pages
IR Practical Theory
No ratings yet
IR Practical Theory
9 pages
Vmodel
No ratings yet
Vmodel
10 pages
IR Practical
No ratings yet
IR Practical
24 pages
Lecture - 7 MSDS
No ratings yet
Lecture - 7 MSDS
32 pages
IR Lecture 4b
No ratings yet
IR Lecture 4b
57 pages
3 termWeightingIR
No ratings yet
3 termWeightingIR
32 pages
TF-IDF and Ranked Retrieval Basics
No ratings yet
TF-IDF and Ranked Retrieval Basics
51 pages
Introduction to IR Models
No ratings yet
Introduction to IR Models
22 pages
Lec 3
No ratings yet
Lec 3
51 pages
IR Prac 2
No ratings yet
IR Prac 2
4 pages
IR - 754 All Practical
No ratings yet
IR - 754 All Practical
21 pages
Lecture 10
No ratings yet
Lecture 10
18 pages
TF Idf
No ratings yet
TF Idf
6 pages
Série RI-récap Corrigée
No ratings yet
Série RI-récap Corrigée
11 pages
Similarity Measures Le 512
No ratings yet
Similarity Measures Le 512
14 pages
Vector Model-21PW41
No ratings yet
Vector Model-21PW41
5 pages
Assign 3
No ratings yet
Assign 3
1 page
AI6122 Topic 3.2 - Ranking
No ratings yet
AI6122 Topic 3.2 - Ranking
27 pages
Chapter-3 Termweighting
No ratings yet
Chapter-3 Termweighting
17 pages
Reference Material For NLP - 1
No ratings yet
Reference Material For NLP - 1
40 pages
Module 3 Indexing Part A
No ratings yet
Module 3 Indexing Part A
46 pages
Term Weighting & The Vector Space Model
No ratings yet
Term Weighting & The Vector Space Model
2 pages
Homework2 Solution
100% (1)
Homework2 Solution
11 pages
3 Termweighting
No ratings yet
3 Termweighting
34 pages
Similarity Computation of Categrical and Ordinal Data
No ratings yet
Similarity Computation of Categrical and Ordinal Data
11 pages
3 Term Weighting
No ratings yet
3 Term Weighting
34 pages
Lecture 3 Annotated
No ratings yet
Lecture 3 Annotated
44 pages
Information Retrieval Exam 2008
100% (1)
Information Retrieval Exam 2008
8 pages
Data Mining
No ratings yet
Data Mining
24 pages
3 Termweighting
No ratings yet
3 Termweighting
40 pages
Chapter 3 IR
No ratings yet
Chapter 3 IR
34 pages
Lecture 3 VSM
No ratings yet
Lecture 3 VSM
16 pages
Excercise3 Solution PDF
No ratings yet
Excercise3 Solution PDF
2 pages
CSI 4107 - Winter 2016 - Midterm
0% (1)
CSI 4107 - Winter 2016 - Midterm
10 pages
Session 4 Text Feature
No ratings yet
Session 4 Text Feature
40 pages
IR Lecture 4b
No ratings yet
IR Lecture 4b
57 pages
Untitled Document
No ratings yet
Untitled Document
7 pages
ISR Chap..3
No ratings yet
ISR Chap..3
26 pages
Chapter Three Term Weighting and Similarity Measures
No ratings yet
Chapter Three Term Weighting and Similarity Measures
25 pages
Chapter Three Term Weighting and Similarity Measures
No ratings yet
Chapter Three Term Weighting and Similarity Measures
33 pages
IR
No ratings yet
IR
12 pages
153 Sanskriti IR File
No ratings yet
153 Sanskriti IR File
55 pages
TF Idf
100% (3)
TF Idf
38 pages
Text Similarity Cosine BOW TF-IDF Lecture
No ratings yet
Text Similarity Cosine BOW TF-IDF Lecture
6 pages
ISR Chap... 5
No ratings yet
ISR Chap... 5
34 pages
AI Syllabus Course
No ratings yet
AI Syllabus Course
16 pages
Major 2020
No ratings yet
Major 2020
2 pages
AAIC Syllabus
No ratings yet
AAIC Syllabus
19 pages
Hitachi Design Doc High Availability Disaster Recovery Solution Using Oracle Data Guard
No ratings yet
Hitachi Design Doc High Availability Disaster Recovery Solution Using Oracle Data Guard
45 pages
Folding Technical Drawings
100% (1)
Folding Technical Drawings
2 pages
Comperlan® 100: Product Data Sheet
100% (1)
Comperlan® 100: Product Data Sheet
2 pages
Lp-Assessment in Learning 1
No ratings yet
Lp-Assessment in Learning 1
8 pages
Instant Access To Computer Organization and Architecture: Designing For Performance 11th Edition William Stallings Ebook Full Chapters
100% (3)
Instant Access To Computer Organization and Architecture: Designing For Performance 11th Edition William Stallings Ebook Full Chapters
50 pages
Coolant Safety & Handling Guide
No ratings yet
Coolant Safety & Handling Guide
7 pages
General Tour Guidelines
No ratings yet
General Tour Guidelines
4 pages
Fail Open (FO) - Air To Close&Fail Closed (FC) - Air To Open
No ratings yet
Fail Open (FO) - Air To Close&Fail Closed (FC) - Air To Open
3 pages
Tec 6076 2 User Manual Prizrak 510 520
No ratings yet
Tec 6076 2 User Manual Prizrak 510 520
16 pages
Dieselju6h NL54
No ratings yet
Dieselju6h NL54
8 pages
MSDS - Sulfuric Acid
No ratings yet
MSDS - Sulfuric Acid
10 pages
Product One Pager
No ratings yet
Product One Pager
1 page
Pfi Es-16
No ratings yet
Pfi Es-16
11 pages
Pre-Spud Meeting On Site
100% (1)
Pre-Spud Meeting On Site
1 page
Stock Beta Calculation Guide
No ratings yet
Stock Beta Calculation Guide
3 pages
Manual-Vacancy Sensor
No ratings yet
Manual-Vacancy Sensor
2 pages
GPPB 2005 Resolution No. 06-2005 (Apr 2005) - Ordering Agreement
No ratings yet
GPPB 2005 Resolution No. 06-2005 (Apr 2005) - Ordering Agreement
8 pages
Working Capital Management: Download Free Books at
0% (2)
Working Capital Management: Download Free Books at
49 pages
THC 8 Syllabus 2022 1
No ratings yet
THC 8 Syllabus 2022 1
14 pages
Tle 2 Syllabus
100% (1)
Tle 2 Syllabus
5 pages
Draft Prospectus-Bdpl (26.12.19)
No ratings yet
Draft Prospectus-Bdpl (26.12.19)
300 pages
SAS 23 Case Study Rheumatoid Arthritis
No ratings yet
SAS 23 Case Study Rheumatoid Arthritis
6 pages
Ethyl Cellulose GRAS FDA
100% (1)
Ethyl Cellulose GRAS FDA
57 pages
Car Price Prediction Report
No ratings yet
Car Price Prediction Report
24 pages
Assignment - Cloud Computing
No ratings yet
Assignment - Cloud Computing
26 pages
Business Combination Test Bank Part 2
100% (1)
Business Combination Test Bank Part 2
21 pages
Viking Residential Sprinkler Installation Guide
No ratings yet
Viking Residential Sprinkler Installation Guide
18 pages
The Chinese in Hawaii - An Annotated Bibliography 77239085
No ratings yet
The Chinese in Hawaii - An Annotated Bibliography 77239085
164 pages
Jed Long Churchill Report
No ratings yet
Jed Long Churchill Report
76 pages
Arpita Selot
No ratings yet
Arpita Selot
4 pages

Vector Model Sum

Uploaded by

Vector Model Sum

Uploaded by

Q Consider the following documents.

D1: I went to the park to play D2: park is nearby to play

Step 2: Create the Vocabulary

Step 3: Construct Term Frequency (TF) Matrix

Step 4: Compute Inverse Document Frequency (IDF)

• N: Total number of documents (N=3).

Step 5: Calculate TF-IDF Weight (wij):

Term D1 (wij) D2 (wij) D3 (wij) Q (wij)

Step 7: Calculate Cosine Similarity:

Calculate Dot Products (Q. D):

Calculate Cosine Similarities:

Step 8: Rank Documents

You might also like