- SimRank++: Query Rewriting through Link Analysis of the Click Graph ๊ตฌํ
- bipartite ์ผ์ด์ค์ ๋ํด์๋ง ๊ตฌํ
- http://www.vldb.org/pvldb/1/1453903.pdf
- numpy
- BipartiteGraph() ์์ฑ
G = BipartiteGraph()
- BipartiteGraph.add_edge()๋ก edge๋ฅผ ์ถ๊ฐํด ๊ทธ๋ํ ์์ฑ
G.add_edge("camera", "hp.com", 1.0)
G.add_edge("camera", "bestbuy.com", 1.0)
G.add_edge("digital_camera", "hp.com", 1.0)
G.add_edge("digital_camera", "bestbuy.com", 1.0)
- (Optional) sub graph๋ก splitํ๊ธฐ
- ํ์ํ๋ค๋ฉด sub graph๋ก graph๋ฅผ ๋ถ๋ฆฌํด์ ๊ณ์ฐ ์๊ฐ์ด๋ ๋ฉ๋ชจ๋ฆฌ ์ฌ์ฉ๋์ ์ค์ผ ์ ์๋ค.
subgraph_list = G.split_subgraphs()
- simrank_double_plus_bipartite(BipartiteGraph)๋ก ์ ์ฌ๋ ๊ณ์ฐ. lns, rns ์ ์ฌ๋๋ฅผ ๋ฐ๋ก ๋ฐ์.
lns_sim, rns_sim = simrank_double_plus_bipartite(G)
- Similarty๋ฅผ ๋
ธ๋ ์ค์ ๊ฐ๋ค๋ก ์ถ๋ ฅํ ์ ์๋๋ก ์ฌ์ (dict)ํ์ผ๋ก ์ ๋ฆฌํด์ฃผ๋ convert_sim_to_dict
- ์๋์ ๊ฐ์ด graph์ ๊ณ์ฐํ lns_sim, rns_sim matrix๋ฅผ ๋๊ฒจ์ฃผ๋ฉด, ๊ฐ๊ฐ ์ฌ์ ํ์ ๋ฐํ
...
> lns_sim_dict, rns_sim_dict = convert_sim_to_dict(G3, lns_sim, rns_sim)
> print json.dumps(lns_sim_dict, sort_keys=True, indent=4, ensure_ascii=False)
{
"๊ธ๋ฐ์ง24K": {
"์๊ธ๋ฐ์ง": 9.405719204286108e-05
},
"์๊ธ๋ฐ์ง": {
"๊ธ๋ฐ์ง24K": 9.405719204286108e-05
}
}
> ./simrank.py
...
example 1
camera --> hp.com, bestbuy.com
digital_camera --> hp.com, bestbuy.com
Converge after 10 iterations (eps=0.000100).
sim
0 | camera
1 | digital_camera
[[ 1. 0.49994757]
[ 0.49994757 1. ]]
0 | bestbuy.com
1 | hp.com
[[ 1. 0.49994757]
[ 0.49994757 1. ]]
example2
pc --> hp.com
camera --> hp.com
Converge after 2 iterations (eps=0.000100).
sim
0 | pc
1 | camera
[[ 1. 0.4]
[ 0.4 1. ]]
0 | hp.com
[[ 1.]]
example3, ()์์ weight
์๊ธ๋ฐ์ง -> 1656770532(8), 1967201158(6), 1218341923(10), 886857215(5)
๊ธ๋ฐ์ง24K -> 1218341923(6)
Converge after 2 iterations (eps=0.000100).
sim
0 | ์๊ธ๋ฐ์ง
1 | ๊ธ๋ฐ์ง24K
[[ 1.00000000e+00 4.77748390e-05]
[ 4.77748390e-05 1.00000000e+00]]
0 | 1656770532
2 | 1967201158
1 | 1218341923
3 | 886857215
[[ 1.00000000e+00 1.57029184e-04 2.50690679e-04 2.50690679e-04]
[ 1.57029184e-04 1.00000000e+00 1.57029184e-04 1.57029184e-04]
[ 2.50690679e-04 1.57029184e-04 1.00000000e+00 2.50690679e-04]
[ 2.50690679e-04 1.57029184e-04 2.50690679e-04 1.00000000e+00]]
example4
split into subgraphs
A -> 1, 2 | B -> 1, 3 | C -> 3 | D -> 4, 5 | E -> 5
two subgraphs: (A, B, C) and (D, E)
1-subgraph has 3-lns.
['A', 'C', 'B']
2-subgraph has 2-lns.
['E', 'D']