-
Notifications
You must be signed in to change notification settings - Fork 0
predicate expression def for search filter #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Caution Review failedThe pull request is closed. WalkthroughAdds a JSON-driven predicate expression system for vector metadata filtering (declarations and implementation), switches vector index APIs and internal usearch types from raw uint64_t to Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Parser as PredicateExpression::Parse
participant Builder as ParseJSONObject/ParseSingleField
participant Evaluator as PredicateExpression::Evaluate
participant Comparator as CompareValues
Client->>Parser: Parse(JSON, schema)
Parser->>Builder: build predicate AST ($and/$or/$not/fields)
Builder-->>Parser: root_node_ (AST)
Parser-->>Client: parse result
Client->>Evaluator: Evaluate(record_bytes, offsets, schema)
Evaluator->>Evaluator: EvaluateNode(root_node_)
alt Parent (AND / OR)
Evaluator->>Evaluator: evaluate children (short-circuit)
else NOT
Evaluator->>Evaluator: invert child result
else Leaf
Evaluator->>Evaluator: resolve field bytes via offsets/schema
Evaluator->>Comparator: CompareValues(field_bytes, literal, type, op)
Comparator-->>Evaluator: bool
end
Evaluator-->>Client: bool (match/no-match)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (6)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
0251b9b to
442c3b2
Compare
56d704f to
fac9f68
Compare
442c3b2 to
5a22544
Compare
fac9f68 to
8756ee5
Compare
9da6cfc to
f7f97d7
Compare
8756ee5 to
cc18639
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
src/vector/predicate.cpp (1)
354-389: Allow single-child logical nodes to evaluate correctly.The current size checks force any
$and/$ornode with a single child to returnfalse.$inlowers to anOR, so a filter like{"tags": {"$in": ["foo"]}}always fails. Please treat “no children” as invalid, but allow one child to evaluate normally.- if (and_node.children_.size() < 2) + if (and_node.children_.empty()) { return false; } … - if (or_node.children_.size() < 2) + if (or_node.children_.empty()) { return false; }That restores the expected semantics for single-term
$and,$or, and$in.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
CMakeLists.txt(1 hunks)include/vector/predicate.h(1 hunks)src/vector/predicate.cpp(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: liunyl
Repo: liunyl/eloqkv PR: 7
File: src/vector/vector_handler.cpp:457-466
Timestamp: 2025-09-16T05:13:52.366Z
Learning: In the EloqVec codebase, VectorIndex::search method should be extended to accept search_params parameter to support per-request search parameters, and the class should support default search/update parameters stored in VectorMetadata and applied during index initialization.
🧬 Code graph analysis (2)
src/vector/predicate.cpp (2)
include/vector/vector_util.h (1)
ParseJSONFieldValue(31-100)include/vector/predicate.h (1)
CompareValues(202-209)
include/vector/predicate.h (1)
src/vector/predicate.cpp (8)
Parse(317-336)Parse(317-318)Evaluate(338-344)Evaluate(338-340)EvaluateNode(346-422)EvaluateNode(346-349)CompareValues(424-490)CompareValues(424-429)
f7f97d7 to
378bf77
Compare
cc18639 to
c43b5b2
Compare
| */ | ||
| bool EvaluateNode(const PredicateNode &node, | ||
| const std::vector<char> &metadata, | ||
| const std::vector<size_t> &offsets, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[cpplint] reported by reviewdog 🐶
Add #include for vector<> [build/include_what_you_use] [4]
| PredicateNode::Uptr root_node_{nullptr}; | ||
| }; | ||
|
|
||
| } // namespace EloqVec No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[cpplint] reported by reviewdog 🐶
Could not find a newline character at the end of the file. [whitespace/ending_newline] [5]
| size_t rhs_size, | ||
| PredicateOp op) | ||
| { | ||
| size_t cmp_size = rhs_size == 0 ? lhs_size : std::min(lhs_size, rhs_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[cpplint] reported by reviewdog 🐶
Add #include for min [build/include_what_you_use] [4]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (3)
src/vector/predicate.cpp (1)
22-28: Add missing standard library includes.The code uses standard library features without including their headers:
std::memcmprequires<cstring>std::make_uniquerequires<memory>std::moverequires<utility>std::vectorrequires<vector>std::stringrequires<string>These missing includes were already flagged by cpplint in past reviews.
include/vector/predicate.h (2)
24-28: Add missing vector include.The header uses
std::vectoron lines 139, 166-167, 188-189 but doesn't include<vector>. This was already flagged by cpplint in past reviews.
195-195: Add newline at end of file.Missing newline character at the end of the file, as flagged by cpplint in previous reviews.
🧹 Nitpick comments (4)
src/vector/predicate.cpp (3)
147-148: Validate single key-value pair before access.The code assumes
field_valuecontains exactly one key-value pair and directly accesses.begin()without validation. If a malformed filter provides multiple operators for a field (e.g.,{"age": {"$gt": 18, "$lt": 65}}), only the first is processed silently.Add validation:
+ if (field_value.size() != 1) + { + return nullptr; + } + const std::string &op_str = field_value.begin().key(); const nlohmann::json &value_json = field_value.begin().value();
198-217: Validate non-empty arrays for logical operators.Empty
$andarrays create nodes with no children. While evaluation will returnfalsefor such nodes (lines 323-326), it's better to reject invalid filters at parse time.if (obj.contains("$and")) { - if (!obj["$and"].is_array()) + if (!obj["$and"].is_array() || obj["$and"].empty()) { return nullptr; }Apply the same check to
$orat line 221.
319-336: Move node structure validation to parse time.The check
and_node.children_.size() < 2validates tree structure at evaluation time, but structure is fixed after parsing. This check should be enforced when building the tree (inParseJSONObject) and converted to an assert here for internal consistency.Same issue applies to OR (lines 341-344) and NOT (lines 359-362) operators.
include/vector/predicate.h (1)
87-87: Consider making op_ protected for encapsulation.The public
op_member exposes internal state. While this may be intentional for simplicity (similar tochildren_inParentPredNode), a getter would better encapsulate the implementation.+protected: PredicateOp op_; + +public: + PredicateOp GetOp() const { return op_; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
CMakeLists.txt(1 hunks)include/vector/predicate.h(1 hunks)src/vector/predicate.cpp(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- CMakeLists.txt
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: liunyl
Repo: liunyl/eloqkv PR: 7
File: src/vector/vector_handler.cpp:457-466
Timestamp: 2025-09-16T05:13:52.366Z
Learning: In the EloqVec codebase, VectorIndex::search method should be extended to accept search_params parameter to support per-request search parameters, and the class should support default search/update parameters stored in VectorMetadata and applied during index initialization.
🧬 Code graph analysis (2)
include/vector/predicate.h (1)
src/vector/predicate.cpp (6)
Parse(282-301)Parse(282-283)Evaluate(303-309)Evaluate(303-305)EvaluateNode(311-395)EvaluateNode(311-314)
src/vector/predicate.cpp (1)
include/vector/vector_util.h (1)
ParseJSONFieldValue(105-195)
🔇 Additional comments (5)
src/vector/predicate.cpp (3)
36-86: LGTM: Byte-wise comparison is correct and safe.The implementation uses
std::memcmpfor byte-wise comparison, avoiding the unaligned read issues that were flagged in previous reviews. The size-aware logic correctly handles comparisons where buffer sizes differ.
89-120: LGTM: Operator parsing is correct.The if-else chain is appropriate for this small set of operators and the code correctly returns
UNKNOWNfor unrecognized operators.
282-301: LGTM: Parse entry point with proper error handling.The function correctly validates input, handles JSON parsing exceptions, and returns clear success/failure status.
include/vector/predicate.h (2)
36-62: LGTM: PredicateOp enum is well-defined.The operator enumeration is clear and uses an appropriate underlying type.
90-140: LGTM: Node hierarchy is well-designed.The predicate node hierarchy correctly uses inheritance, enables move semantics, disables copying, and provides appropriate virtual methods.
| const PredicateNode &Root() const | ||
| { | ||
| return *root_node_; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Root() dereferences potentially null pointer.
The Root() method dereferences root_node_ without checking if it's null. If Parse() hasn't been called or failed, this causes undefined behavior.
Consider returning an optional reference or asserting with a clear error message:
const PredicateNode &Root() const
{
+ assert(root_node_ != nullptr && "Root() called before successful Parse()");
return *root_node_;
}Or provide a safer API:
- const PredicateNode &Root() const
+ const PredicateNode *Root() const
{
- return *root_node_;
+ return root_node_.get();
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const PredicateNode &Root() const | |
| { | |
| return *root_node_; | |
| } | |
| const PredicateNode &Root() const | |
| { | |
| assert(root_node_ != nullptr && "Root() called before successful Parse()"); | |
| return *root_node_; | |
| } |
🤖 Prompt for AI Agents
In include/vector/predicate.h around lines 173-176, Root() currently
dereferences root_node_ without null-checking; change the API to avoid UB by
either (A) returning a nullable type such as
std::optional<std::reference_wrapper<const PredicateNode>> (preferred) and
return std::nullopt when root_node_ is null, updating all callers to handle the
optional, or (B) keep the reference signature but add a defensive check and fail
fast: assert(root_node_ && "Predicate::Root() called before successful Parse()")
or throw a clear std::runtime_error when root_node_ is null; pick one approach
and update callers and tests accordingly.
| bool PredicateExpression::Evaluate(const std::vector<char> &metadata, | ||
| const std::vector<size_t> &offsets, | ||
| const VectorRecordMetadata &schema) const | ||
| { | ||
| assert(root_node_ != nullptr); | ||
| return EvaluateNode(*root_node_, metadata, offsets, schema); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace assert with runtime check in public API.
The assert on line 307 is removed in release builds, leading to undefined behavior if Evaluate is called before Parse or after a failed parse. Public APIs must validate preconditions at runtime.
bool PredicateExpression::Evaluate(const std::vector<char> &metadata,
const std::vector<size_t> &offsets,
const VectorRecordMetadata &schema) const
{
- assert(root_node_ != nullptr);
+ if (root_node_ == nullptr)
+ {
+ return false;
+ }
return EvaluateNode(*root_node_, metadata, offsets, schema);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| bool PredicateExpression::Evaluate(const std::vector<char> &metadata, | |
| const std::vector<size_t> &offsets, | |
| const VectorRecordMetadata &schema) const | |
| { | |
| assert(root_node_ != nullptr); | |
| return EvaluateNode(*root_node_, metadata, offsets, schema); | |
| } | |
| bool PredicateExpression::Evaluate(const std::vector<char> &metadata, | |
| const std::vector<size_t> &offsets, | |
| const VectorRecordMetadata &schema) const | |
| { | |
| if (root_node_ == nullptr) | |
| { | |
| return false; | |
| } | |
| return EvaluateNode(*root_node_, metadata, offsets, schema); | |
| } |
🤖 Prompt for AI Agents
In src/vector/predicate.cpp around lines 303-309, replace the assert(root_node_
!= nullptr) with a runtime precondition check that throws a descriptive
exception when root_node_ is null (e.g., throw
std::runtime_error("PredicateExpression::Evaluate called before Parse or after
failed parse")); then proceed to call EvaluateNode. Ensure <stdexcept> is
included if not already. This enforces the public API contract at runtime
instead of relying on asserts which are disabled in release builds.
c43b5b2 to
b3a9aab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (4)
include/vector/predicate.h (2)
173-176: GuardRoot()against an empty expression.If
Parse()has not been called (or failed),root_node_remainsnullptr, soRoot()dereferences a null pointer and triggers UB. Returning a pointer avoids the crash and forces callers to check for a parsed tree.- const PredicateNode &Root() const - { - return *root_node_; - } + const PredicateNode *Root() const + { + return root_node_.get(); + }
24-29: Include the headers you use.
PredicateNoderelies onuint8_tandLeafPredNodestores astd::vector<char>, but this header does not include<cstdint>or<vector>. Any translation unit that includespredicate.hwithout pulling those headers first will fail to compile. Please add the missing includes.#include <list> #include <memory> #include <string> #include <string_view> +#include <cstdint> +#include <vector>src/vector/predicate.cpp (2)
303-309: Don’t crash in release when Evaluate is called too early.
Evaluatestill dereferencesroot_node_after only anassert, so in release builds (where asserts vanish) callingEvaluatebefore a successfulParseleads to a null dereference. Returnfalse(or throw) when no predicate has been parsed.- assert(root_node_ != nullptr); - return EvaluateNode(*root_node_, metadata, offsets, schema); + if (root_node_ == nullptr) + { + return false; + } + return EvaluateNode(*root_node_, metadata, offsets, schema);
24-27: Add the standard headers needed by this TU.
CompareValuescallsstd::minandstd::memcmp, but<algorithm>and<cstring>are not included. Depending on transitive includes makes the build brittle; please include them explicitly.#include "predicate.h" #include <cassert> +#include <algorithm> +#include <cstring> #include <nlohmann/json.hpp>
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
CMakeLists.txt(1 hunks)include/vector/predicate.h(1 hunks)src/vector/predicate.cpp(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: liunyl
Repo: liunyl/eloqkv PR: 7
File: src/vector/vector_handler.cpp:457-466
Timestamp: 2025-09-16T05:13:52.366Z
Learning: In the EloqVec codebase, VectorIndex::search method should be extended to accept search_params parameter to support per-request search parameters, and the class should support default search/update parameters stored in VectorMetadata and applied during index initialization.
🧬 Code graph analysis (2)
src/vector/predicate.cpp (1)
include/vector/vector_util.h (1)
ParseJSONFieldValue(105-195)
include/vector/predicate.h (1)
src/vector/predicate.cpp (6)
Parse(282-301)Parse(282-283)Evaluate(303-309)Evaluate(303-305)EvaluateNode(311-399)EvaluateNode(311-314)
b3a9aab to
04364dd
Compare
| field_name, field_value, field_type); | ||
| } | ||
|
|
||
| const std::string &op_str = field_value.begin().key(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[cpplint] reported by reviewdog 🐶
Add #include for string [build/include_what_you_use] [4]
| if (obj.size() > 1) | ||
| { | ||
| ParentPredNode::Uptr and_node = | ||
| std::make_unique<ParentPredNode>(PredicateOp::AND); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[cpplint] reported by reviewdog 🐶
Add #include for make_unique<> [build/include_what_you_use] [4]
| { | ||
| return nullptr; | ||
| } | ||
| and_node->children_.emplace_back(std::move(child_node)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[cpplint] reported by reviewdog 🐶
Add #include for move [build/include_what_you_use] [4]
|
|
||
| bool PredicateExpression::EvaluateNode(const PredicateNode &node, | ||
| const std::vector<char> &metadata, | ||
| const std::vector<size_t> &offsets, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[cpplint] reported by reviewdog 🐶
Add #include for vector<> [build/include_what_you_use] [4]
| metadata_ptr, metadata_size, value_ptr, value_size, leaf_node.op_); | ||
| } | ||
|
|
||
| } // namespace EloqVec No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[cpplint] reported by reviewdog 🐶
Could not find a newline character at the end of the file. [whitespace/ending_newline] [5]
378bf77 to
7148c7f
Compare
update predicate update predicate evalue value fix predicate evalue update predi evalua update predicate fix predicate parse
04364dd to
3ffa604
Compare
Summary by CodeRabbit
New Features
Refactor
Chores