Skip to content

Conversation

@yi-xmu
Copy link
Collaborator

@yi-xmu yi-xmu commented Oct 24, 2025

Summary by CodeRabbit

  • New Features

    • Predicate expression system for filtering vector metadata via JSON queries, supporting comparisons, membership (IN → OR), and logical ops (AND/OR/NOT); type-aware parsing and efficient evaluation.
  • Refactor

    • Vector index API switched from raw integer IDs to a typed identifier, and search filters now receive that typed ID (may affect integrations).
  • Chores

    • Vector-index build now includes the predicate component when vector indexing is enabled.

@yi-xmu yi-xmu self-assigned this Oct 24, 2025
@coderabbitai
Copy link

coderabbitai bot commented Oct 24, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

Adds a JSON-driven predicate expression system for vector metadata filtering (declarations and implementation), switches vector index APIs and internal usearch types from raw uint64_t to VectorId, and wires src/vector/predicate.cpp into the VECTOR_INDEX_ENABLED build resources.

Changes

Cohort / File(s) Summary
Build Configuration
CMakeLists.txt
Added src/vector/predicate.cpp to SET(RESELOQ_RESOURCES and to the VECTOR_INDEX_ENABLED resource list.
Predicate API (header)
include/vector/predicate.h
New predicate AST and API: PredicateOp enum, PredicateNode (move-only), ParentPredNode, LeafPredNode, and PredicateExpression with Parse(...), Evaluate(...), Root() and internal helpers.
Predicate Implementation
src/vector/predicate.cpp
Implements JSON→predicate-tree parsing (supports $and, $or, $not, field ops, $in expansion), operator mapping, multi-value handling, byte/type-aware comparisons, PredicateExpression::Parse and Evaluate with short-circuit evaluation.
Vector index API (headers)
include/vector/hnsw_vector_index.h, include/vector/vector_index.h
Public API changes: replace uint64_t id params and filter std::function<bool(uint64_t)> with VectorId and std::function<bool(const VectorId &)>; update method signatures (add, add_batch, remove, update, get, search).
Vector index implementation
src/vector/hnsw_vector_index.cpp
Internal switch from index_dense_t to index_dense_gt<VectorId>, update function implementations to accept/propagate VectorId, add usearch specializations (hash_gt<VectorId>, misaligned_ref/store/load, comparisons) and adapt search filter lambda usage.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Parser as PredicateExpression::Parse
    participant Builder as ParseJSONObject/ParseSingleField
    participant Evaluator as PredicateExpression::Evaluate
    participant Comparator as CompareValues

    Client->>Parser: Parse(JSON, schema)
    Parser->>Builder: build predicate AST ($and/$or/$not/fields)
    Builder-->>Parser: root_node_ (AST)
    Parser-->>Client: parse result

    Client->>Evaluator: Evaluate(record_bytes, offsets, schema)
    Evaluator->>Evaluator: EvaluateNode(root_node_)
    alt Parent (AND / OR)
        Evaluator->>Evaluator: evaluate children (short-circuit)
    else NOT
        Evaluator->>Evaluator: invert child result
    else Leaf
        Evaluator->>Evaluator: resolve field bytes via offsets/schema
        Evaluator->>Comparator: CompareValues(field_bytes, literal, type, op)
        Comparator-->>Evaluator: bool
    end
    Evaluator-->>Client: bool (match/no-match)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Extra attention: src/vector/hnsw_vector_index.cpp (usearch type adaptations, misaligned store/load, hashing and equality specializations).
  • Validate src/vector/predicate.cpp JSON parsing edge cases, type resolution against VectorRecordMetadata, and CompareValues correctness for numeric/string/boolean.
  • Check public API changes in headers for ABI/compatibility and update call-sites accordingly.
  • Review ownership/lifetime and move-only semantics in include/vector/predicate.h.

Possibly related PRs

Suggested reviewers

  • liunyl

Poem

🐰 I parsed the JSON, hopped through each branch,
Built leaves and parents in a nimble quick dance,
I compare bytes and IDs with a twitch of my nose,
Short-circuit a hop — then curl up in my woes. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 38.89% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: introducing a predicate expression definition system for search filtering, which is reflected in the new header, implementation, and CMake configuration changes.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 04364dd and 3ffa604.

📒 Files selected for processing (6)
  • CMakeLists.txt (1 hunks)
  • include/vector/hnsw_vector_index.h (3 hunks)
  • include/vector/predicate.h (1 hunks)
  • include/vector/vector_index.h (5 hunks)
  • src/vector/hnsw_vector_index.cpp (9 hunks)
  • src/vector/predicate.cpp (1 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@yi-xmu yi-xmu force-pushed the predicate_def_for_search branch from 56d704f to fac9f68 Compare October 27, 2025 10:39
@yi-xmu yi-xmu force-pushed the predicate_def_for_search branch from fac9f68 to 8756ee5 Compare October 30, 2025 11:11
@yi-xmu yi-xmu changed the base branch from json_parse_funcs to vector_index_interface_update October 30, 2025 11:11
@yi-xmu yi-xmu force-pushed the vector_index_interface_update branch from 9da6cfc to f7f97d7 Compare October 31, 2025 10:05
@yi-xmu yi-xmu force-pushed the predicate_def_for_search branch from 8756ee5 to cc18639 Compare October 31, 2025 10:08
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/vector/predicate.cpp (1)

354-389: Allow single-child logical nodes to evaluate correctly.

The current size checks force any $and/$or node with a single child to return false. $in lowers to an OR, so a filter like {"tags": {"$in": ["foo"]}} always fails. Please treat “no children” as invalid, but allow one child to evaluate normally.

-        if (and_node.children_.size() < 2)
+        if (and_node.children_.empty())
         {
             return false;
         }
…
-        if (or_node.children_.size() < 2)
+        if (or_node.children_.empty())
         {
             return false;
         }

That restores the expected semantics for single-term $and, $or, and $in.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f7f97d7 and cc18639.

📒 Files selected for processing (3)
  • CMakeLists.txt (1 hunks)
  • include/vector/predicate.h (1 hunks)
  • src/vector/predicate.cpp (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: liunyl
Repo: liunyl/eloqkv PR: 7
File: src/vector/vector_handler.cpp:457-466
Timestamp: 2025-09-16T05:13:52.366Z
Learning: In the EloqVec codebase, VectorIndex::search method should be extended to accept search_params parameter to support per-request search parameters, and the class should support default search/update parameters stored in VectorMetadata and applied during index initialization.
🧬 Code graph analysis (2)
src/vector/predicate.cpp (2)
include/vector/vector_util.h (1)
  • ParseJSONFieldValue (31-100)
include/vector/predicate.h (1)
  • CompareValues (202-209)
include/vector/predicate.h (1)
src/vector/predicate.cpp (8)
  • Parse (317-336)
  • Parse (317-318)
  • Evaluate (338-344)
  • Evaluate (338-340)
  • EvaluateNode (346-422)
  • EvaluateNode (346-349)
  • CompareValues (424-490)
  • CompareValues (424-429)

@yi-xmu yi-xmu force-pushed the vector_index_interface_update branch from f7f97d7 to 378bf77 Compare November 7, 2025 06:05
@yi-xmu yi-xmu force-pushed the predicate_def_for_search branch from cc18639 to c43b5b2 Compare November 7, 2025 06:06
*/
bool EvaluateNode(const PredicateNode &node,
const std::vector<char> &metadata,
const std::vector<size_t> &offsets,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[cpplint] reported by reviewdog 🐶
Add #include for vector<> [build/include_what_you_use] [4]

PredicateNode::Uptr root_node_{nullptr};
};

} // namespace EloqVec No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[cpplint] reported by reviewdog 🐶
Could not find a newline character at the end of the file. [whitespace/ending_newline] [5]

size_t rhs_size,
PredicateOp op)
{
size_t cmp_size = rhs_size == 0 ? lhs_size : std::min(lhs_size, rhs_size);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[cpplint] reported by reviewdog 🐶
Add #include for min [build/include_what_you_use] [4]

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (3)
src/vector/predicate.cpp (1)

22-28: Add missing standard library includes.

The code uses standard library features without including their headers:

  • std::memcmp requires <cstring>
  • std::make_unique requires <memory>
  • std::move requires <utility>
  • std::vector requires <vector>
  • std::string requires <string>

These missing includes were already flagged by cpplint in past reviews.

include/vector/predicate.h (2)

24-28: Add missing vector include.

The header uses std::vector on lines 139, 166-167, 188-189 but doesn't include <vector>. This was already flagged by cpplint in past reviews.


195-195: Add newline at end of file.

Missing newline character at the end of the file, as flagged by cpplint in previous reviews.

🧹 Nitpick comments (4)
src/vector/predicate.cpp (3)

147-148: Validate single key-value pair before access.

The code assumes field_value contains exactly one key-value pair and directly accesses .begin() without validation. If a malformed filter provides multiple operators for a field (e.g., {"age": {"$gt": 18, "$lt": 65}}), only the first is processed silently.

Add validation:

+    if (field_value.size() != 1)
+    {
+        return nullptr;
+    }
+
     const std::string &op_str = field_value.begin().key();
     const nlohmann::json &value_json = field_value.begin().value();

198-217: Validate non-empty arrays for logical operators.

Empty $and arrays create nodes with no children. While evaluation will return false for such nodes (lines 323-326), it's better to reject invalid filters at parse time.

     if (obj.contains("$and"))
     {
-        if (!obj["$and"].is_array())
+        if (!obj["$and"].is_array() || obj["$and"].empty())
         {
             return nullptr;
         }

Apply the same check to $or at line 221.


319-336: Move node structure validation to parse time.

The check and_node.children_.size() < 2 validates tree structure at evaluation time, but structure is fixed after parsing. This check should be enforced when building the tree (in ParseJSONObject) and converted to an assert here for internal consistency.

Same issue applies to OR (lines 341-344) and NOT (lines 359-362) operators.

include/vector/predicate.h (1)

87-87: Consider making op_ protected for encapsulation.

The public op_ member exposes internal state. While this may be intentional for simplicity (similar to children_ in ParentPredNode), a getter would better encapsulate the implementation.

+protected:
     PredicateOp op_;
+
+public:
+    PredicateOp GetOp() const { return op_; }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cc18639 and c43b5b2.

📒 Files selected for processing (3)
  • CMakeLists.txt (1 hunks)
  • include/vector/predicate.h (1 hunks)
  • src/vector/predicate.cpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • CMakeLists.txt
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: liunyl
Repo: liunyl/eloqkv PR: 7
File: src/vector/vector_handler.cpp:457-466
Timestamp: 2025-09-16T05:13:52.366Z
Learning: In the EloqVec codebase, VectorIndex::search method should be extended to accept search_params parameter to support per-request search parameters, and the class should support default search/update parameters stored in VectorMetadata and applied during index initialization.
🧬 Code graph analysis (2)
include/vector/predicate.h (1)
src/vector/predicate.cpp (6)
  • Parse (282-301)
  • Parse (282-283)
  • Evaluate (303-309)
  • Evaluate (303-305)
  • EvaluateNode (311-395)
  • EvaluateNode (311-314)
src/vector/predicate.cpp (1)
include/vector/vector_util.h (1)
  • ParseJSONFieldValue (105-195)
🔇 Additional comments (5)
src/vector/predicate.cpp (3)

36-86: LGTM: Byte-wise comparison is correct and safe.

The implementation uses std::memcmp for byte-wise comparison, avoiding the unaligned read issues that were flagged in previous reviews. The size-aware logic correctly handles comparisons where buffer sizes differ.


89-120: LGTM: Operator parsing is correct.

The if-else chain is appropriate for this small set of operators and the code correctly returns UNKNOWN for unrecognized operators.


282-301: LGTM: Parse entry point with proper error handling.

The function correctly validates input, handles JSON parsing exceptions, and returns clear success/failure status.

include/vector/predicate.h (2)

36-62: LGTM: PredicateOp enum is well-defined.

The operator enumeration is clear and uses an appropriate underlying type.


90-140: LGTM: Node hierarchy is well-designed.

The predicate node hierarchy correctly uses inheritance, enables move semantics, disables copying, and provides appropriate virtual methods.

Comment on lines +173 to +176
const PredicateNode &Root() const
{
return *root_node_;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Root() dereferences potentially null pointer.

The Root() method dereferences root_node_ without checking if it's null. If Parse() hasn't been called or failed, this causes undefined behavior.

Consider returning an optional reference or asserting with a clear error message:

     const PredicateNode &Root() const
     {
+        assert(root_node_ != nullptr && "Root() called before successful Parse()");
         return *root_node_;
     }

Or provide a safer API:

-    const PredicateNode &Root() const
+    const PredicateNode *Root() const
     {
-        return *root_node_;
+        return root_node_.get();
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const PredicateNode &Root() const
{
return *root_node_;
}
const PredicateNode &Root() const
{
assert(root_node_ != nullptr && "Root() called before successful Parse()");
return *root_node_;
}
🤖 Prompt for AI Agents
In include/vector/predicate.h around lines 173-176, Root() currently
dereferences root_node_ without null-checking; change the API to avoid UB by
either (A) returning a nullable type such as
std::optional<std::reference_wrapper<const PredicateNode>> (preferred) and
return std::nullopt when root_node_ is null, updating all callers to handle the
optional, or (B) keep the reference signature but add a defensive check and fail
fast: assert(root_node_ && "Predicate::Root() called before successful Parse()")
or throw a clear std::runtime_error when root_node_ is null; pick one approach
and update callers and tests accordingly.

Comment on lines +303 to +364
bool PredicateExpression::Evaluate(const std::vector<char> &metadata,
const std::vector<size_t> &offsets,
const VectorRecordMetadata &schema) const
{
assert(root_node_ != nullptr);
return EvaluateNode(*root_node_, metadata, offsets, schema);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replace assert with runtime check in public API.

The assert on line 307 is removed in release builds, leading to undefined behavior if Evaluate is called before Parse or after a failed parse. Public APIs must validate preconditions at runtime.

 bool PredicateExpression::Evaluate(const std::vector<char> &metadata,
                                    const std::vector<size_t> &offsets,
                                    const VectorRecordMetadata &schema) const
 {
-    assert(root_node_ != nullptr);
+    if (root_node_ == nullptr)
+    {
+        return false;
+    }
     return EvaluateNode(*root_node_, metadata, offsets, schema);
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
bool PredicateExpression::Evaluate(const std::vector<char> &metadata,
const std::vector<size_t> &offsets,
const VectorRecordMetadata &schema) const
{
assert(root_node_ != nullptr);
return EvaluateNode(*root_node_, metadata, offsets, schema);
}
bool PredicateExpression::Evaluate(const std::vector<char> &metadata,
const std::vector<size_t> &offsets,
const VectorRecordMetadata &schema) const
{
if (root_node_ == nullptr)
{
return false;
}
return EvaluateNode(*root_node_, metadata, offsets, schema);
}
🤖 Prompt for AI Agents
In src/vector/predicate.cpp around lines 303-309, replace the assert(root_node_
!= nullptr) with a runtime precondition check that throws a descriptive
exception when root_node_ is null (e.g., throw
std::runtime_error("PredicateExpression::Evaluate called before Parse or after
failed parse")); then proceed to call EvaluateNode. Ensure <stdexcept> is
included if not already. This enforces the public API contract at runtime
instead of relying on asserts which are disabled in release builds.

@yi-xmu yi-xmu force-pushed the predicate_def_for_search branch from c43b5b2 to b3a9aab Compare November 7, 2025 06:34
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
include/vector/predicate.h (2)

173-176: Guard Root() against an empty expression.

If Parse() has not been called (or failed), root_node_ remains nullptr, so Root() dereferences a null pointer and triggers UB. Returning a pointer avoids the crash and forces callers to check for a parsed tree.

-    const PredicateNode &Root() const
-    {
-        return *root_node_;
-    }
+    const PredicateNode *Root() const
+    {
+        return root_node_.get();
+    }

24-29: Include the headers you use.

PredicateNode relies on uint8_t and LeafPredNode stores a std::vector<char>, but this header does not include <cstdint> or <vector>. Any translation unit that includes predicate.h without pulling those headers first will fail to compile. Please add the missing includes.

 #include <list>
 #include <memory>
 #include <string>
 #include <string_view>
+#include <cstdint>
+#include <vector>
src/vector/predicate.cpp (2)

303-309: Don’t crash in release when Evaluate is called too early.

Evaluate still dereferences root_node_ after only an assert, so in release builds (where asserts vanish) calling Evaluate before a successful Parse leads to a null dereference. Return false (or throw) when no predicate has been parsed.

-    assert(root_node_ != nullptr);
-    return EvaluateNode(*root_node_, metadata, offsets, schema);
+    if (root_node_ == nullptr)
+    {
+        return false;
+    }
+    return EvaluateNode(*root_node_, metadata, offsets, schema);

24-27: Add the standard headers needed by this TU.

CompareValues calls std::min and std::memcmp, but <algorithm> and <cstring> are not included. Depending on transitive includes makes the build brittle; please include them explicitly.

 #include "predicate.h"

 #include <cassert>
+#include <algorithm>
+#include <cstring>
 #include <nlohmann/json.hpp>
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c43b5b2 and b3a9aab.

📒 Files selected for processing (3)
  • CMakeLists.txt (1 hunks)
  • include/vector/predicate.h (1 hunks)
  • src/vector/predicate.cpp (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: liunyl
Repo: liunyl/eloqkv PR: 7
File: src/vector/vector_handler.cpp:457-466
Timestamp: 2025-09-16T05:13:52.366Z
Learning: In the EloqVec codebase, VectorIndex::search method should be extended to accept search_params parameter to support per-request search parameters, and the class should support default search/update parameters stored in VectorMetadata and applied during index initialization.
🧬 Code graph analysis (2)
src/vector/predicate.cpp (1)
include/vector/vector_util.h (1)
  • ParseJSONFieldValue (105-195)
include/vector/predicate.h (1)
src/vector/predicate.cpp (6)
  • Parse (282-301)
  • Parse (282-283)
  • Evaluate (303-309)
  • Evaluate (303-305)
  • EvaluateNode (311-399)
  • EvaluateNode (311-314)

@yi-xmu yi-xmu force-pushed the predicate_def_for_search branch from b3a9aab to 04364dd Compare November 7, 2025 07:43
field_name, field_value, field_type);
}

const std::string &op_str = field_value.begin().key();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[cpplint] reported by reviewdog 🐶
Add #include for string [build/include_what_you_use] [4]

if (obj.size() > 1)
{
ParentPredNode::Uptr and_node =
std::make_unique<ParentPredNode>(PredicateOp::AND);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[cpplint] reported by reviewdog 🐶
Add #include for make_unique<> [build/include_what_you_use] [4]

{
return nullptr;
}
and_node->children_.emplace_back(std::move(child_node));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[cpplint] reported by reviewdog 🐶
Add #include for move [build/include_what_you_use] [4]


bool PredicateExpression::EvaluateNode(const PredicateNode &node,
const std::vector<char> &metadata,
const std::vector<size_t> &offsets,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[cpplint] reported by reviewdog 🐶
Add #include for vector<> [build/include_what_you_use] [4]

metadata_ptr, metadata_size, value_ptr, value_size, leaf_node.op_);
}

} // namespace EloqVec No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[cpplint] reported by reviewdog 🐶
Could not find a newline character at the end of the file. [whitespace/ending_newline] [5]

@yi-xmu yi-xmu force-pushed the vector_index_interface_update branch from 378bf77 to 7148c7f Compare November 18, 2025 05:49
update predicate

update predicate evalue value

fix predicate evalue

update predi evalua

update predicate

fix predicate parse
@yi-xmu yi-xmu force-pushed the predicate_def_for_search branch from 04364dd to 3ffa604 Compare November 18, 2025 05:53
@yi-xmu yi-xmu changed the base branch from vector_index_interface_update to vector_v1 November 18, 2025 05:53
@yi-xmu yi-xmu merged commit ec19d26 into vector_v1 Nov 18, 2025
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants