•
•
d1: “new york times”
3 documents d2: “new york post”
d3: “los angeles times”
angeles los new post times york
d1 0 0 1 0 1 1
d2 0 0 1 1 0 1
d3 1 1 0 0 1 0
d1: “new york times”
3 documents d2: “new york post”
d3: “los angeles times”
•
TERM DOC-FREQUENCY IDF
angeles 1 log2(3/1) = 1.584
los 1 log2(3/1) = 1.584
new 2 log2(3/2) = 0.584
post 1 log2(3/1) = 1.584
times 2 log2(3/2) = 0.584
york 2 log2(3/2) = 0.584
TERM DOC-FREQUENCY IDF
angeles 1 log2(3/1) = 1.584
los 1 log2(3/1) = 1.584
new 2 log2(3/2) = 0.584
post 1 log2(3/1) = 1.584
times 2 log2(3/2) = 0.584
york 2 log2(3/2) = 0.584
angeles los new post times york
d1 0 0 1 0 1 1
d2 0 0 1 1 0 1
d3 1 1 0 0 1 0
TERM DOC-FREQUENCY IDF
angeles 1 log2(3/1) = 1.584
los 1 log2(3/1) = 1.584
new 2 log2(3/2) = 0.584
post 1 log2(3/1) = 1.584
times 2 log2(3/2) = 0.584
york 2 log2(3/2) = 0.584
angeles los new post times york Length
d1 0 0 0.584 0 0.584 0.584 1.011
d2 0 0 0.584 1.584 0 0.584 1.786
d3 1.584 1.584 0 0 0.584 0 2.316
Length of d1 = sqrt(0.584^2+0.584^2+0.584^2)=1.011
query q: new new york
•
•
Q [0 0 (2/2)*0.584=0.584 0 (1/2)*0.584=0.292 0 ] length(q)=0.652
•
•
•
cosSim(d1,q) = (0.584*0.584+0.584*0.292) / (1.011*0.652) = 0.776
cosSim(d2,q) = (0.584*0.584) / (1.786*0.652) = 0.292
cosSim(d3,q) = (0.584*0.292) / (2.316*0.652) = 0.112
•
•
•
•
•
•
•
0-31 32-63 64-95 96- 128- 159- 192- 223-
127 159 191 223 255
Red 0.04 0.12 0.23 0.06 0.24 0.12 0.13 0.06
Green 0.05 0.07 0.11 0.07 0.26 0.24 0.17 0.03
Blue 0.08 0.13 0.16 0.08 0.03 0.12 0.19 0.21