Skip to content

Conversation

@codeforall
Copy link
Contributor

Overview

This PR fixes a bug where Citus incorrectly rejects valid colocated joins when Row Level Security (RLS) policies use volatile functions like current_setting(). The issue occurs because PostgreSQL's planner splits equivalence classes in the presence of volatile RLS expressions, preventing Citus from recognizing that tables are joined on their distribution columns.

Problem

When an RLS policy uses a volatile function, PostgreSQL creates separate equivalence classes for each table instead of a unified equivalence class:

-- Query
SELECT * FROM table_a 
JOIN table_b ON table_a.tenant_id = table_b.tenant_id;

-- With RLS policy: USING (tenant_id = current_setting('app.tenant_id')::uuid)
-- PostgreSQL creates:
EC1: [table_a.tenant_id, current_setting('app.tenant_id')::uuid]
EC2: [table_b.tenant_id, current_setting('app.tenant_id')::uuid]

-- Instead of:
EC: [table_a.tenant_id, table_b.tenant_id, current_setting('app.tenant_id')::uuid]

This split prevents Citus's GenerateAttributeEquivalencesForRelationRestrictions() from detecting that the tables are joined on tenant_id.

Solution

Core Algorithm:
Implements a two-pass approach in GenerateAttributeEquivalencesForRelationRestrictions():

First Pass: Standard EC processing + RLS pattern detection (EC with both Var and non-Var members)
Second Pass (conditional):

  • Groups ECs by their shared volatile expressions using hash-based comparison
  • Merges Var members from ECs sharing identical expressions
  • Creates new AttributeEquivalenceClass objects representing the implicit joins

PR includes also includes a test case to verify the fix

…classes

When a table with Row Level Security (RLS) policies is joined with another table
on a column protected by an RLS policy using volatile functions
(e.g., current_setting()), PostgreSQL's planner sometimes splits what would
normally be a single equivalence class into separate equivalence
classes for each table.

Problem:
For a query like:

When table_a has an RLS policy on tenant_id using a volatile function,
PostgreSQL creates separate equivalence classes:

EC1: [table_a.tenant_id, current_setting(...)]
EC2: [table_b.tenant_id, current_setting(...)]

Instead of a single equivalence class that would contain both columns:

EC: [table_a.tenant_id, table_b.tenant_id, current_setting(...)]

This prevents Citus from recognizing that the tables are joined on their
distribution columns, causing the query to fail with: "complex joins are
only supported when all distributed tables are co-located and joined on
their distribution columns".

Solution is to detect when multiple equivalence classes share identical
non-Var expressions (volatile RLS functions) and merge the Var members from
these classes to reconstruct the implicit join condition.
This allows Citus to recognize that table_a.tenant_id = table_b.tenant_id
for query planning purposes.

regression test multi_rls_join_distribution_key that verifies joins on
distribution columns is also part of the commit

Fixes: #7969
@codecov
Copy link

codecov bot commented Nov 23, 2025

Codecov Report

❌ Patch coverage is 90.32258% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.78%. Comparing base (b8ab8aa) to head (8464b5f).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8357      +/-   ##
==========================================
- Coverage   88.79%   88.78%   -0.01%     
==========================================
  Files         287      287              
  Lines       63144    63237      +93     
  Branches     7912     7926      +14     
==========================================
+ Hits        56070    56148      +78     
- Misses       4749     4756       +7     
- Partials     2325     2333       +8     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Simplify the multi_rls_join_distribution_key regression test by removing the
EXPLAIN query. The EXPLAIN output is not essential for validating the RLS join
fix and can vary across PostgreSQL major versions, potentially
causing test instability.
@codeforall codeforall marked this pull request as ready for review November 25, 2025 15:57
Copy link
Member

@naisila naisila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix, thanks. Looks good but I would like to test it more extensively, also probably extend the test file a bit. Will review again in detail

FROM table_a AS c
LEFT OUTER JOIN table_b AS t ON c.tenant_id = t.tenant_id
ORDER BY c.id, t.id;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add some 3- and 4-way joins, to exercise the fix. Along the lines of:

SELECT c.id, b.id, a.id
FROM table_a AS a
LEFT JOIN table_b AS b ON a.tenant_id = b.tenant_id
JOIN table_c AS c ON b.tenant_id = c.tenant_id
ORDER BY c.id, b.id, a.id;

where table_c has similar ddl to the other tables. And something that can be transformed to a join (at least by recent PG versions)

SELECT a.id FROM table_a a
 WHERE a.tenant_id IN (SELECT b.tenant_id FROM table_b JOIN table_c USING (tenant_id));

bool skipRLSProcessing = !restrictionContext->hasRowSecurity;
foreach(relationRestrictionCell, restrictionContext->relationRestrictionList)
{
RelationRestriction *relationRestriction =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about picking up the row security property from the restriction's plannerInfo->parse ? And avoid the need to carry it in the restriction context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. I didn't realize restrictions carry plannerInfo->parse as well. Fixed.

{
hasVar = true;
}
else if (!IsA(expr, Param) && !IsA(expr, Const))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if it should also check that member ->em_is_const is set here? That flag means (I think) the expression has no Var nodes. Just to minimize the chance of incorrectly merging, though I don't have an actual example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed it totally makes sense to be extra cautious, done!

{
hasVar = true;
}
else if (!IsA(expr, Param) && !IsA(expr, Const))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment to above - also check that member's em_is_const is set ?


/* Check if this Var belongs to this planner's range table */
if (var->varno < root->simple_rel_array_size &&
root->simple_rte_array[var->varno] != NULL)
Copy link
Contributor

@colm-mchugh colm-mchugh Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the intent is to match the Var to its RelationRestriction, why not match var->varno against the relation restriction's index field ? That gets set (in the relation restriction hook) to the position of the relation in the planner's range table array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@colm-mchugh
Copy link
Contributor

+1 nice fix, had some comments/questions on some of the details. And more tests please.

@codeforall codeforall linked an issue Nov 29, 2025 that may be closed by this pull request
This commit includes the following

1. Enhanced test coverage with multi-way joins (3-way, 4-way), IN subquery,
   and additional RLS policies across multiple tables

2. Removed hasRowSecurity field from RelationRestrictionContext, now
   accessing parse->hasRowSecurity directly from each relation's PlannerInfo.

3. Added em_is_const safety checks in both RLS pattern detection and
   expression collection phases to specifically target pseudoconstant
   expressions (RLS functions with no Vars)

4. Changed from array bounds checking to direct index matching (varno ==
   relRestriction->index) for clarity and added DEBUG2 logging for
   unmatched Vars during Phase 2 processing
Copy link
Contributor

@colm-mchugh colm-mchugh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed tests

@codeforall codeforall merged commit cec2ab8 into main Dec 18, 2025
403 of 406 checks passed
@codeforall codeforall deleted the muusama/7969 branch December 18, 2025 06:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RLS changes attributeEquivalenceList, misleading error message

3 participants