0% found this document useful (0 votes)
84 views10 pages

Vector Space Model

The document contains information about 3 documents - "New York Times", "New York Post", and "Los Angeles Times". It lists the terms contained in each document and their document frequency. It then calculates the inverse document frequency (IDF) for each term. It shows the term frequency-inverse document frequency (TF-IDF) weighting for each term in each document. Finally, it performs a cosine similarity calculation to compare a query "new new york" to each document.

Uploaded by

Pramod Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views10 pages

Vector Space Model

The document contains information about 3 documents - "New York Times", "New York Post", and "Los Angeles Times". It lists the terms contained in each document and their document frequency. It then calculates the inverse document frequency (IDF) for each term. It shows the term frequency-inverse document frequency (TF-IDF) weighting for each term in each document. Finally, it performs a cosine similarity calculation to compare a query "new new york" to each document.

Uploaded by

Pramod Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10


d1: “new york times”
3 documents d2: “new york post”
d3: “los angeles times”

angeles los new post times york

d1 0 0 1 0 1 1
d2 0 0 1 1 0 1
d3 1 1 0 0 1 0
d1: “new york times”
3 documents d2: “new york post”
d3: “los angeles times”


TERM DOC-FREQUENCY IDF
angeles 1 log2(3/1) = 1.584
los 1 log2(3/1) = 1.584
new 2 log2(3/2) = 0.584
post 1 log2(3/1) = 1.584
times 2 log2(3/2) = 0.584
york 2 log2(3/2) = 0.584
TERM DOC-FREQUENCY IDF
angeles 1 log2(3/1) = 1.584
los 1 log2(3/1) = 1.584
new 2 log2(3/2) = 0.584
post 1 log2(3/1) = 1.584
times 2 log2(3/2) = 0.584
york 2 log2(3/2) = 0.584

angeles los new post times york

d1 0 0 1 0 1 1
d2 0 0 1 1 0 1
d3 1 1 0 0 1 0
TERM DOC-FREQUENCY IDF
angeles 1 log2(3/1) = 1.584
los 1 log2(3/1) = 1.584
new 2 log2(3/2) = 0.584
post 1 log2(3/1) = 1.584
times 2 log2(3/2) = 0.584
york 2 log2(3/2) = 0.584

angeles los new post times york Length

d1 0 0 0.584 0 0.584 0.584 1.011


d2 0 0 0.584 1.584 0 0.584 1.786
d3 1.584 1.584 0 0 0.584 0 2.316
Length of d1 = sqrt(0.584^2+0.584^2+0.584^2)=1.011
query q: new new york



Q [0 0 (2/2)*0.584=0.584 0 (1/2)*0.584=0.292 0 ] length(q)=0.652



cosSim(d1,q) = (0.584*0.584+0.584*0.292) / (1.011*0.652) = 0.776


cosSim(d2,q) = (0.584*0.584) / (1.786*0.652) = 0.292
cosSim(d3,q) = (0.584*0.292) / (2.316*0.652) = 0.112







0-31 32-63 64-95 96- 128- 159- 192- 223-
127 159 191 223 255
Red 0.04 0.12 0.23 0.06 0.24 0.12 0.13 0.06
Green 0.05 0.07 0.11 0.07 0.26 0.24 0.17 0.03
Blue 0.08 0.13 0.16 0.08 0.03 0.12 0.19 0.21

You might also like