-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Fix mark join decorrelation #20033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix mark join decorrelation #20033
Conversation
Mark joins without correlation on the left side could not be handled correctly without this PR. Information regarding the correlated columns was simply lost, which could lead to wrong query behavior. This PR fixes this and introduces correct decorrelation of mark joins without correlation on the left side. ---- fix: duckdb#19504
8e752a0 to
55ae961
Compare
… side have correlation Kind of a weird case, maybe we should leave the join decorrelation early here. Will have to investigate some time.
lnkuiper
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Looks good to me. I just have one question, then this can be merged.
| } | ||
| } else if (join.join_type == JoinType::MARK) { | ||
| if (!left_has_correlation && right_has_correlation) { | ||
| // found a MARK join where the left side has no correlation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This covers the left/right {0, 1} correlation case. Are we sure all other cases are covered by the existing code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be quite honest—I'm not 100% sure if we will need another special case, when the left and right sides are correlated, like is the case here:
SELECT query,
(WITH t AS MATERIALIZED (SELECT query)
SELECT *
FROM (SELECT query) AS _(x)
WHERE x IN (SELECT * FROM t)) AS broken
FROM (VALUES (('cat')), (('dog')), (('duck'))) AS queries(query);However, the query does return the correct result 🤔 So, we may be fine, I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that's good enough for me, for now. v1.4 is LTS, so if we run into another issue we can fix it, and we can reference this discussion.
Fix mark join decorrelation (duckdb/duckdb#20033) Enable windows_arm64 arch for main extensions (duckdb/duckdb#20004)
Fix mark join decorrelation (duckdb/duckdb#20033) Enable windows_arm64 arch for main extensions (duckdb/duckdb#20004) Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
Mark joins without correlation on the left side could not be handled correctly without this PR. Information regarding the correlated columns was simply lost, which could lead to wrong query behavior. This PR fixes this and introduces correct decorrelation of mark joins without correlation on the left side.
Fix: #19504