Skip to content

Conversation

@taniabogatsch
Copy link
Contributor

After repeated issues with the index join, we decided that removing it is the best choice for duckdb as of its current state. Currently, the index join is slow, and the results are often incorrect. Making it both performant and correct requires significant effort and with the current state and all the past changes to our indexes, it might be faster to reimplement the index join than attempting to fix it.

I did more benchmarking as suggested by @pdet. Looking at the profiler, we primarily see a lot of idle time spent waiting to acquire locks. From this, I concluded that we should use both write and read locks for indexes to support fast index joins. If we only use indexes for constraints, we always need to acquire a write lock, as we cannot allow concurrent changes to the index while validating constraints. In this scenario, we do not need to distinguish between read and write locks.

Benchmarks

Here is an example of a scenario where an index join should beat a hash join. However, duckdb performs significantly worse when using index joins.

name Index Join ART High Selectivity
group art

load
SET threads=1;
PRAGMA force_index_join;
CREATE TABLE pk_tbl (j VARCHAR, i VARCHAR PRIMARY KEY, h INTEGER, k INTEGER, n VARCHAR);
INSERT INTO pk_tbl SELECT (range || 'abcdef' || range || 'abcdef' || range) AS j, j AS i, range, range, j AS n FROM range(800000);
CREATE TABLE fk_tbl (i VARCHAR REFERENCES pk_tbl(i));
INSERT INTO fk_tbl SELECT (range * 80000) || 'abcdef' || (range * 80000) || 'abcdef' || (range * 80000) FROM range(10);

run
SELECT * FROM pk_tbl JOIN fk_tbl USING (i);

Index Join

tania@motorbook duckdb % build/reldebug/benchmark/benchmark_runner benchmark/micro/index/join/high_selectivity_index_join.benchmark 
name	run	timing
benchmark/micro/index/join/high_selectivity_index_join.benchmark	1	0.047384
benchmark/micro/index/join/high_selectivity_index_join.benchmark	2	0.047003
benchmark/micro/index/join/high_selectivity_index_join.benchmark	3	0.046800
benchmark/micro/index/join/high_selectivity_index_join.benchmark	4	0.047088
benchmark/micro/index/join/high_selectivity_index_join.benchmark	5	0.046856

Hash Join

tania@motorbook duckdb % build/reldebug/benchmark/benchmark_runner benchmark/micro/index/join/high_selectivity_index_join.benchmark
name	run	timing
benchmark/micro/index/join/high_selectivity_index_join.benchmark	1	0.009318
benchmark/micro/index/join/high_selectivity_index_join.benchmark	2	0.009221
benchmark/micro/index/join/high_selectivity_index_join.benchmark	3	0.009251
benchmark/micro/index/join/high_selectivity_index_join.benchmark	4	0.009372
benchmark/micro/index/join/high_selectivity_index_join.benchmark	5	0.009366

@Mytherin
Copy link
Collaborator

Thanks!

@taniabogatsch taniabogatsch deleted the remove-idx-join branch November 22, 2023 09:36
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request Dec 11, 2023
Merge pull request duckdb/duckdb#9751 from taniabogatsch/remove-idx-join
Merge pull request duckdb/duckdb#9735 from carlopi/wasm_threads
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants