Skip to content

jackleg/simrank

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

25 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๊ฐœ์š”

  • SimRank++: Query Rewriting through Link Analysis of the Click Graph ๊ตฌํ˜„
    • bipartite ์ผ€์ด์Šค์— ๋Œ€ํ•ด์„œ๋งŒ ๊ตฌํ˜„
  • http://www.vldb.org/pvldb/1/1453903.pdf

prerequisites

  • numpy

์‚ฌ์šฉ๋ฒ•

  1. BipartiteGraph() ์ƒ์„ฑ
G = BipartiteGraph()
  1. BipartiteGraph.add_edge()๋กœ edge๋ฅผ ์ถ”๊ฐ€ํ•ด ๊ทธ๋ž˜ํ”„ ์™„์„ฑ
G.add_edge("camera", "hp.com", 1.0)
G.add_edge("camera", "bestbuy.com", 1.0)
G.add_edge("digital_camera", "hp.com", 1.0)
G.add_edge("digital_camera", "bestbuy.com", 1.0)
  1. (Optional) sub graph๋กœ splitํ•˜๊ธฐ
  • ํ•„์š”ํ•˜๋‹ค๋ฉด sub graph๋กœ graph๋ฅผ ๋ถ„๋ฆฌํ•ด์„œ ๊ณ„์‚ฐ ์‹œ๊ฐ„์ด๋‚˜ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์„ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค.
subgraph_list = G.split_subgraphs()
  1. simrank_double_plus_bipartite(BipartiteGraph)๋กœ ์œ ์‚ฌ๋„ ๊ณ„์‚ฐ. lns, rns ์œ ์‚ฌ๋„๋ฅผ ๋”ฐ๋กœ ๋ฐ›์Œ.
lns_sim, rns_sim = simrank_double_plus_bipartite(G)

Helper

  • Similarty๋ฅผ ๋…ธ๋“œ ์‹ค์ œ ๊ฐ’๋“ค๋กœ ์ถœ๋ ฅํ•  ์ˆ˜ ์žˆ๋„๋ก ์‚ฌ์ „(dict)ํ˜•์œผ๋กœ ์ •๋ฆฌํ•ด์ฃผ๋Š” convert_sim_to_dict
    • ์•„๋ž˜์™€ ๊ฐ™์ด graph์™€ ๊ณ„์‚ฐํ•œ lns_sim, rns_sim matrix๋ฅผ ๋„˜๊ฒจ์ฃผ๋ฉด, ๊ฐ๊ฐ ์‚ฌ์ „ํ˜•์„ ๋ฐ˜ํ™˜
...
> lns_sim_dict, rns_sim_dict = convert_sim_to_dict(G3, lns_sim, rns_sim)
> print json.dumps(lns_sim_dict, sort_keys=True, indent=4, ensure_ascii=False)
{
    "๊ธˆ๋ฐ˜์ง€24K": {
        "์ˆœ๊ธˆ๋ฐ˜์ง€": 9.405719204286108e-05
    },
    "์ˆœ๊ธˆ๋ฐ˜์ง€": {
        "๊ธˆ๋ฐ˜์ง€24K": 9.405719204286108e-05
    }
}

๊ฒฐ๊ณผ ์˜ˆ์‹œ

> ./simrank.py
...
example 1
camera --> hp.com, bestbuy.com
digital_camera --> hp.com, bestbuy.com
Converge after 10 iterations (eps=0.000100).
sim
0 | camera
1 | digital_camera
[[ 1.          0.49994757]
 [ 0.49994757  1.        ]]
0 | bestbuy.com
1 | hp.com
[[ 1.          0.49994757]
 [ 0.49994757  1.        ]]
example2
pc --> hp.com
camera --> hp.com
Converge after 2 iterations (eps=0.000100).
sim
0 | pc
1 | camera
[[ 1.   0.4]
 [ 0.4  1. ]]
0 | hp.com
[[ 1.]]
example3, ()์•ˆ์€ weight
์ˆœ๊ธˆ๋ฐ˜์ง€ -> 1656770532(8), 1967201158(6), 1218341923(10), 886857215(5)
๊ธˆ๋ฐ˜์ง€24K -> 1218341923(6)
Converge after 2 iterations (eps=0.000100).
sim
0 | ์ˆœ๊ธˆ๋ฐ˜์ง€
1 | ๊ธˆ๋ฐ˜์ง€24K
[[  1.00000000e+00   4.77748390e-05]
 [  4.77748390e-05   1.00000000e+00]]
0 | 1656770532
2 | 1967201158
1 | 1218341923
3 | 886857215
[[  1.00000000e+00   1.57029184e-04   2.50690679e-04   2.50690679e-04]
 [  1.57029184e-04   1.00000000e+00   1.57029184e-04   1.57029184e-04]
 [  2.50690679e-04   1.57029184e-04   1.00000000e+00   2.50690679e-04]
 [  2.50690679e-04   1.57029184e-04   2.50690679e-04   1.00000000e+00]]
example4
split into subgraphs
A -> 1, 2 | B -> 1, 3 | C -> 3 | D -> 4, 5 | E -> 5
two subgraphs: (A, B, C) and (D, E)
1-subgraph has 3-lns.
['A', 'C', 'B']
2-subgraph has 2-lns.
['E', 'D']

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages