-
Notifications
You must be signed in to change notification settings - Fork 740
Detect distribution column joins when RLS policies split equivalence … #8357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…classes When a table with Row Level Security (RLS) policies is joined with another table on a column protected by an RLS policy using volatile functions (e.g., current_setting()), PostgreSQL's planner sometimes splits what would normally be a single equivalence class into separate equivalence classes for each table. Problem: For a query like: When table_a has an RLS policy on tenant_id using a volatile function, PostgreSQL creates separate equivalence classes: EC1: [table_a.tenant_id, current_setting(...)] EC2: [table_b.tenant_id, current_setting(...)] Instead of a single equivalence class that would contain both columns: EC: [table_a.tenant_id, table_b.tenant_id, current_setting(...)] This prevents Citus from recognizing that the tables are joined on their distribution columns, causing the query to fail with: "complex joins are only supported when all distributed tables are co-located and joined on their distribution columns". Solution is to detect when multiple equivalence classes share identical non-Var expressions (volatile RLS functions) and merge the Var members from these classes to reconstruct the implicit join condition. This allows Citus to recognize that table_a.tenant_id = table_b.tenant_id for query planning purposes. regression test multi_rls_join_distribution_key that verifies joins on distribution columns is also part of the commit Fixes: #7969
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #8357 +/- ##
==========================================
- Coverage 88.79% 88.78% -0.01%
==========================================
Files 287 287
Lines 63144 63237 +93
Branches 7912 7926 +14
==========================================
+ Hits 56070 56148 +78
- Misses 4749 4756 +7
- Partials 2325 2333 +8 🚀 New features to boost your workflow:
|
Simplify the multi_rls_join_distribution_key regression test by removing the EXPLAIN query. The EXPLAIN output is not essential for validating the RLS join fix and can vary across PostgreSQL major versions, potentially causing test instability.
naisila
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice fix, thanks. Looks good but I would like to test it more extensively, also probably extend the test file a bit. Will review again in detail
| FROM table_a AS c | ||
| LEFT OUTER JOIN table_b AS t ON c.tenant_id = t.tenant_id | ||
| ORDER BY c.id, t.id; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets add some 3- and 4-way joins, to exercise the fix. Along the lines of:
SELECT c.id, b.id, a.id
FROM table_a AS a
LEFT JOIN table_b AS b ON a.tenant_id = b.tenant_id
JOIN table_c AS c ON b.tenant_id = c.tenant_id
ORDER BY c.id, b.id, a.id;
where table_c has similar ddl to the other tables. And something that can be transformed to a join (at least by recent PG versions)
SELECT a.id FROM table_a a
WHERE a.tenant_id IN (SELECT b.tenant_id FROM table_b JOIN table_c USING (tenant_id));
| bool skipRLSProcessing = !restrictionContext->hasRowSecurity; | ||
| foreach(relationRestrictionCell, restrictionContext->relationRestrictionList) | ||
| { | ||
| RelationRestriction *relationRestriction = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about picking up the row security property from the restriction's plannerInfo->parse ? And avoid the need to carry it in the restriction context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out. I didn't realize restrictions carry plannerInfo->parse as well. Fixed.
| { | ||
| hasVar = true; | ||
| } | ||
| else if (!IsA(expr, Param) && !IsA(expr, Const)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if it should also check that member ->em_is_const is set here? That flag means (I think) the expression has no Var nodes. Just to minimize the chance of incorrectly merging, though I don't have an actual example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed it totally makes sense to be extra cautious, done!
| { | ||
| hasVar = true; | ||
| } | ||
| else if (!IsA(expr, Param) && !IsA(expr, Const)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment to above - also check that member's em_is_const is set ?
|
|
||
| /* Check if this Var belongs to this planner's range table */ | ||
| if (var->varno < root->simple_rel_array_size && | ||
| root->simple_rte_array[var->varno] != NULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the intent is to match the Var to its RelationRestriction, why not match var->varno against the relation restriction's index field ? That gets set (in the relation restriction hook) to the position of the relation in the planner's range table array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
|
+1 nice fix, had some comments/questions on some of the details. And more tests please. |
This commit includes the following 1. Enhanced test coverage with multi-way joins (3-way, 4-way), IN subquery, and additional RLS policies across multiple tables 2. Removed hasRowSecurity field from RelationRestrictionContext, now accessing parse->hasRowSecurity directly from each relation's PlannerInfo. 3. Added em_is_const safety checks in both RLS pattern detection and expression collection phases to specifically target pseudoconstant expressions (RLS functions with no Vars) 4. Changed from array bounds checking to direct index matching (varno == relRestriction->index) for clarity and added DEBUG2 logging for unmatched Vars during Phase 2 processing
colm-mchugh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed tests
Overview
This PR fixes a bug where Citus incorrectly rejects valid colocated joins when Row Level Security (RLS) policies use volatile functions like current_setting(). The issue occurs because PostgreSQL's planner splits equivalence classes in the presence of volatile RLS expressions, preventing Citus from recognizing that tables are joined on their distribution columns.
Problem
When an RLS policy uses a volatile function, PostgreSQL creates separate equivalence classes for each table instead of a unified equivalence class:
This split prevents Citus's GenerateAttributeEquivalencesForRelationRestrictions() from detecting that the tables are joined on tenant_id.
Solution
Core Algorithm:
Implements a two-pass approach in GenerateAttributeEquivalencesForRelationRestrictions():
First Pass: Standard EC processing + RLS pattern detection (EC with both Var and non-Var members)
Second Pass (conditional):
PR includes also includes a test case to verify the fix