Testing native vs custom hashmap algorithms

The idea is to benchmark performance over <byte[32], int32> hashmaps in different languages where byte[32] is itself SHA256 of some data.

Supposing RAM is not an issue, so some tests have consumed up to 16 Gb of RAM.

The custom algorithm can be way imperfect, so take these tests with a grain of salt and of course better alternatives are very welcome.

The algorithm doesn't hash a key, instead it expects SHA256 as a key and stores each key and value pair in it's hash tree.

Everywhere release x64 configurations has been used.

"Tree size" means size of pointer arrays in the top level of hash map and in the consequent layers.

So 3*8+7 / 1*8-4 means 3 bytes plus 7 bits of key data will be used for addressing the top pointer array of the hash tree, while 1 byte minus 4 bits of subsequent key data will be used for addressing pointer arrays of the lower levels of the hash tree.

Basically a pointer array size in bytes of any hash tree level on x64 system can be calculated as (1 << bits) * 8 and these are the most RAM consuming parts of the algorithms.

The results are in milliseconds and consist of two tests in a form of T1/T2 where:

T1: adding pairs
T2: getting and validating values

The testing has been performed on Windows 7x64.

CPU: Intel Core i7-4790 3.6 GHz 4 cores / 8 threads
RAM: 32 Gb

32'768 records with test hash 2d array on a heap in GoLang

Standard map	18+3 / 18-2	Postgres	MongoDB no-index	MongoDB index	Redis	Memcached
6/2	9/4	15380/9861	9893/245567	10611/11382	3389/3398	3180/3328

8'000'000 records with test hash 2d array on a stack

Tree size	c++
18 / 18	3751/2230
28 / 18	3771/2265
28 / 28	mem over
38 / 18	2911/1559
18 / 28	mem over
38+1 / 18	2715/1429
38+2 / 18	2025/1108
38+3 / 18	2222/1215
38+4 / 18	2394/1246
38+5 / 18	3019/1363
38+2 / 18+1	2211/1127
38+2 / 18-1	1996/1136
38+2 / 18-2	1901/1060
38+2 / 18-3	1900/1049
38+2 / 18-4	1878/1070
38+2 / 18-5	1901/1045
38+2 / 18-6	1865/1056
38+2 / 18-7	1864/1079
38+3 / 18-6	2152/1179
38+1 / 18-6	1921/1285
38-23 / 18-7	stack ovr
38-22 / 18-6	8072/8726
38-21 / 18-5	6381/6659
38-20 / 18-4	4756/4725
38-19 / 18-3	4022/3695
38-18 / 18-2	3543/3102
38-17 / 18-1	3830/3060

8'000'000 records with test hash 2d array on a heap

Tree size	c	c++	c#	go	rust
38+2 / 18-6	2341/1223	1831/1016	29558/38409	2472/830	2990/1919

16'000'000 records with test hash 2d array on a heap

Tree size	c++	rust
38+2 / 18-6	4234/2700
38+7 / 18-4		mem over

32'000'000 records with test hash 2d array on a heap (native hash maps)

Native	c++	c#	go	rust
algorithms	28591/12456	9861/15098	10925/5155	16700/9936

32'000'000 records with test hash 2d array on a heap

Tree size	c	c++	c#	go	rust
38+2 / 18-6		22717/22665
38+1 / 18-6	13557/8156				9353/11247
38+2 / 18-6	10470/6103	22844/22823			9044/9615
38+3 / 18-7	10173/5845				9429/8911
38+3 / 18-6	10204/5875	19207/18580		12316/4920	9461/8662
38+3 / 18-5	10155/5862				9507/8496
38+3 / 18-4	10181/5825				10269/8460
38+3 / 18-3					11149/8435
38+3 / 18-0	10296/5841
38+4 / 18-3				10988/4198	12460/8008
38+4 / 18-4				10940/4020	11895/8008
38+4 / 18-5				10956/4049	11597/8011
38+4 / 18-6	10909/6640	18378/17201	155978/200232	11021/4064	12074/8205
38+4 / 18-7				10920/4487	11438/8218
38+5 / 18-6	11487/6969	14940/13481		10359/4252
38+6 / 18-6	12371/7156	13441/9476		11398/4096
38+7 / 18-6	13706/7146	13887/8002		13486/3960
38+7 / 18-5	13774/7176	13435/7748
38+7 / 18-4	13707/7138	13634/7650	mem over	mem over	mem over
38+7 / 18-3	13717/7140	13629/7660
38+7 / 18-2		mem over
38+7 / 18		mem over

64'000'000 records with test hash 2d array on a heap

Tree size	c++	go
38+2 / 18-6	stack over
38+2 / 18	mem over
38+3 / 18	mem over
38+4 / 18	mem over
38+4 / 18-4		mem over
38+5 / 18	mem over

Conclusion

I have highlighted the best key lookup performance for the languages on 32'000'000 key dataset.

Actually I'm amazed how well go performs, even its native map implementation, and actually I would give it the 1st place in these tests, while с would get the 2nd.

Meanwhile it is highly possible that the models here are far from being well optimized, so the real rating can be absolutely different.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.vscode		.vscode
test-c		test-c
test-cpp		test-cpp
test-cs		test-cs
test-go-memcached		test-go-memcached
test-go-mongo		test-go-mongo
test-go-postgres		test-go-postgres
test-go-redis		test-go-redis
test-go		test-go
test-rust		test-rust
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Testing native vs custom hashmap algorithms

32'768 records with test hash 2d array on a heap in GoLang

8'000'000 records with test hash 2d array on a stack

8'000'000 records with test hash 2d array on a heap

16'000'000 records with test hash 2d array on a heap

32'000'000 records with test hash 2d array on a heap (native hash maps)

32'000'000 records with test hash 2d array on a heap

64'000'000 records with test hash 2d array on a heap

Conclusion

About

Uh oh!

Uh oh!

Languages

LumaRay/test-hash

Folders and files

Latest commit

History

Repository files navigation

Testing native vs custom hashmap algorithms

32'768 records with test hash 2d array on a heap in GoLang

8'000'000 records with test hash 2d array on a stack

8'000'000 records with test hash 2d array on a heap

16'000'000 records with test hash 2d array on a heap

32'000'000 records with test hash 2d array on a heap (native hash maps)

32'000'000 records with test hash 2d array on a heap

64'000'000 records with test hash 2d array on a heap

Conclusion

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages