Remove index joins #9751

taniabogatsch · 2023-11-21T15:13:02Z

After repeated issues with the index join, we decided that removing it is the best choice for duckdb as of its current state. Currently, the index join is slow, and the results are often incorrect. Making it both performant and correct requires significant effort and with the current state and all the past changes to our indexes, it might be faster to reimplement the index join than attempting to fix it.

I did more benchmarking as suggested by @pdet. Looking at the profiler, we primarily see a lot of idle time spent waiting to acquire locks. From this, I concluded that we should use both write and read locks for indexes to support fast index joins. If we only use indexes for constraints, we always need to acquire a write lock, as we cannot allow concurrent changes to the index while validating constraints. In this scenario, we do not need to distinguish between read and write locks.

Benchmarks

Here is an example of a scenario where an index join should beat a hash join. However, duckdb performs significantly worse when using index joins.

name Index Join ART High Selectivity
group art

load
SET threads=1;
PRAGMA force_index_join;
CREATE TABLE pk_tbl (j VARCHAR, i VARCHAR PRIMARY KEY, h INTEGER, k INTEGER, n VARCHAR);
INSERT INTO pk_tbl SELECT (range || 'abcdef' || range || 'abcdef' || range) AS j, j AS i, range, range, j AS n FROM range(800000);
CREATE TABLE fk_tbl (i VARCHAR REFERENCES pk_tbl(i));
INSERT INTO fk_tbl SELECT (range * 80000) || 'abcdef' || (range * 80000) || 'abcdef' || (range * 80000) FROM range(10);

run
SELECT * FROM pk_tbl JOIN fk_tbl USING (i);

Index Join

tania@motorbook duckdb % build/reldebug/benchmark/benchmark_runner benchmark/micro/index/join/high_selectivity_index_join.benchmark 
name	run	timing
benchmark/micro/index/join/high_selectivity_index_join.benchmark	1	0.047384
benchmark/micro/index/join/high_selectivity_index_join.benchmark	2	0.047003
benchmark/micro/index/join/high_selectivity_index_join.benchmark	3	0.046800
benchmark/micro/index/join/high_selectivity_index_join.benchmark	4	0.047088
benchmark/micro/index/join/high_selectivity_index_join.benchmark	5	0.046856

Hash Join

tania@motorbook duckdb % build/reldebug/benchmark/benchmark_runner benchmark/micro/index/join/high_selectivity_index_join.benchmark
name	run	timing
benchmark/micro/index/join/high_selectivity_index_join.benchmark	1	0.009318
benchmark/micro/index/join/high_selectivity_index_join.benchmark	2	0.009221
benchmark/micro/index/join/high_selectivity_index_join.benchmark	3	0.009251
benchmark/micro/index/join/high_selectivity_index_join.benchmark	4	0.009372
benchmark/micro/index/join/high_selectivity_index_join.benchmark	5	0.009366

Mytherin · 2023-11-22T08:01:46Z

Thanks!

Merge pull request duckdb/duckdb#9751 from taniabogatsch/remove-idx-join Merge pull request duckdb/duckdb#9735 from carlopi/wasm_threads

taniabogatsch added 5 commits November 20, 2023 16:55

remove index join

f5e8e9e

more index join removal

cfca5fc

Merge branch 'main' into remove-idx-join

c1d2920

adjusted more tests

6ff9320

Merge branch 'main' into remove-idx-join

4f3ccdb

taniabogatsch mentioned this pull request Nov 21, 2023

unit test "Concurrent appends during index join" fails when run on a loaded system #8634

Closed

1 task

Mytherin merged commit 0a18eb0 into duckdb:main Nov 22, 2023

taniabogatsch deleted the remove-idx-join branch November 22, 2023 09:36

krlmlr added a commit to duckdb/duckdb-r that referenced this pull request Dec 11, 2023

chore: Update vendored sources to duckdb/duckdb@0a18eb0

f44717e

Merge pull request duckdb/duckdb#9751 from taniabogatsch/remove-idx-join Merge pull request duckdb/duckdb#9735 from carlopi/wasm_threads

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove index joins #9751

Remove index joins #9751

Uh oh!

taniabogatsch commented Nov 21, 2023

Uh oh!

Mytherin commented Nov 22, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Remove index joins #9751

Remove index joins #9751

Uh oh!

Conversation

taniabogatsch commented Nov 21, 2023

Benchmarks

Index Join

Hash Join

Uh oh!

Mytherin commented Nov 22, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants