Skip to content

feat: partition-level reopen for buffered commands on standby#484

Open
thweetkomputer wants to merge 1 commit into
mainfrom
feat-committs-gap-trigger-reopen-zc
Open

feat: partition-level reopen for buffered commands on standby#484
thweetkomputer wants to merge 1 commit into
mainfrom
feat-committs-gap-trigger-reopen-zc

Conversation

@thweetkomputer

@thweetkomputer thweetkomputer commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator
  • Add delayed reopen queue in eloqstore with priority heap
  • Add ReopenPartition virtual chain through DataStore/DataStoreService
  • Add CcShard::RequestPartitionReopen with per-partition dedup
  • Trigger reopen from FetchRecordCc, UploadTxCommandsCc, KeyObjectStandbyForwardCc, and ReplayLogCc when buffered commands remain after BackFill
  • Fix BackFill condition (&& -> ||) to allow CCE update when KV has a newer version, needed when reopen loads fresh cloud data

Signed-off-by: Jerry Zhao

Here are some reminders before you submit the pull request

  • Add tests for the change
  • Document changes
  • Reference the link of issue using fixes eloqdb/tx_service#issue_id
  • Reference the link of RFC if exists
  • Pass ./mtr --suite=mono_main,mono_multi,mono_basic

Summary by CodeRabbit

  • New Features

    • Read operations can request partition "reopen" before fetching, improving correctness during concurrent/standby workflows.
    • After committing buffered commands, the system may trigger an additional standby record fetch to ensure up-to-date data.
  • Chores

    • New configurable auto-reopen pending timeout (default 10s) to tune reopen timing.

@coderabbitai

coderabbitai Bot commented Jun 1, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

The PR threads a new boolean reopen flag from clients through RPCs into the data store and the underlying KV layer, adds reopen-aware local request handling and config, and propagates reopen through TX service FetchRecord request creation and buffered-command follow-ups.

Changes

Partition Reopen API and Orchestration

Layer / File(s) Summary
Data store reopen API contracts
store_handler/eloq_data_store_service/data_store_service.h, store_handler/eloq_data_store_service/eloq_store_data_store.h, store_handler/data_store_service_client.h
DataStoreService::Read and DataStoreServiceClient::Read gain reopen parameter. EloqStoreDataStore declares ReopenPartition and OnReopenPartition.
RPC proto and internal request reopen support
store_handler/eloq_data_store_service/ds_request.proto, store_handler/eloq_data_store_service/internal_request.h
ReadRequest proto adds reopen field. ReadRequest interface adds GetReopen; ReadRpcRequest and ReadLocalRequest implement and manage reopen state.
ReadClosure reopen flag support
store_handler/data_store_service_client_closure.h
ReadClosure::Reset accepts reopen, stores it, Clear() resets it, PrepareRequest sets RPC reopen, and Reopen() exposes the value.
Data store service and client forwarding
store_handler/eloq_data_store_service/data_store_service.cpp, store_handler/data_store_service_client.cpp, store_handler/eloq_data_store_service/eloq_store_data_store.cpp
DataStoreService::Read forwards reopen to ReadLocalRequest. DataStoreServiceClient::Read threads reopen into ReadClosure and ReadInternal. FetchRecord forwards closure reopen flag. EloqStoreDataStore sets reopen on underlying KV read.
TX service FetchRecordCc and CcShard reopen support
tx_service/include/cc/cc_req_misc.h, tx_service/include/cc/cc_shard.h, tx_service/src/cc/cc_req_misc.cpp, tx_service/src/cc/cc_shard.cpp
FetchRecordCc constructor and member add reopen. CcShard::FetchRecord signature accepts reopen and passes it into request coalescing/FetchRecordCc creation.
Buffered command reopen invocation
tx_service/src/cc/cc_req_misc.cpp, tx_service/include/cc/object_cc_map.h
FetchRecordCc::Execute conditionally issues FetchRecord with reopen=true when buffered commands exist and prior read succeeded. ObjectCcMap Execute paths call FetchRecord from standby when buffered_cmd_list remains. BackFill condition broadened to use OR.
EloqStore config and submodule
store_handler/eloq_data_store_service/eloq_store_config.cpp, store_handler/eloq_data_store_service/eloqstore
Adds gflag eloq_store_auto_reopen_pending_time_sec and wires microsecond conversion; advances eloqstore submodule commit reference.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • liunyl

Poem

🐰 I hopped a flag from client to store,
Through proto and closure it hopped some more.
TX caught the cue, on buffered demand,
Reopen the partition — fresh data at hand.
A tiny hop, a distributed chore!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.06% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description covers the technical changes but lacks issue/RFC references and does not confirm test execution or documentation completion. Reference the issue number using 'fixes eloqdb/tx_service#issue_id' and verify completion of tests, documentation, and test suite execution before merging.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding partition-level reopen functionality for buffered commands on standby nodes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat-committs-gap-trigger-reopen-zc

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tx_service/include/cc/object_cc_map.h (1)

2650-2775: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Preserve normal_obj_sz_ when BackFill overwrites an already-loaded entry.

After changing this guard to Unknown || older-version, the block now runs for CCEs that were already counted in normal_obj_sz_. The tail still unconditionally increments on Normal and never decrements when a previously normal object is replaced by Deleted, so reopen/backfill will drift the object count over time.

💡 Track the previous payload state before overwriting it
-        if (cce->PayloadStatus() == RecordStatus::Unknown ||
+        RecordStatus prev_status = cce->PayloadStatus();
+        if (prev_status == RecordStatus::Unknown ||
             cce->CommitTs() < commit_ts)
         {
             cce->SetCommitTsPayloadStatus(commit_ts, status);
             cce->SetCkptTs(commit_ts);
             DLOG(INFO) << "BackFill key: " << cce->KeyString()
@@
-            if (cce->PayloadStatus() == RecordStatus::Normal)
+            if (cce->PayloadStatus() == RecordStatus::Normal)
             {
-                TemplateCcMap<KeyT, ValueT, false, false>::normal_obj_sz_++;
+                if (prev_status != RecordStatus::Normal)
+                {
+                    TemplateCcMap<KeyT, ValueT, false, false>::normal_obj_sz_++;
+                }
                 if (cce->payload_.cur_payload_ &&
                     cce->payload_.cur_payload_->HasTTL() &&
                     ccp->smallest_ttl_ > cce->payload_.cur_payload_->GetTTL())
                 {
                     ccp->smallest_ttl_ = cce->payload_.cur_payload_->GetTTL();
                 }
             }
             else
             {
+                if (prev_status == RecordStatus::Normal)
+                {
+                    TemplateCcMap<KeyT, ValueT, false, false>::normal_obj_sz_--;
+                }
                 assert(cce->PayloadStatus() == RecordStatus::Deleted);
                 ccp->smallest_ttl_ = 0;
             }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tx_service/include/cc/object_cc_map.h` around lines 2650 - 2775, The BackFill
block (where cce->SetCommitTsPayloadStatus is called) can double-count objects
because TemplateCcMap::normal_obj_sz_ is incremented whenever
cce->PayloadStatus() == RecordStatus::Normal after the overwrite, but it never
decrements if the prior payload was Normal and becomes Deleted; before mutating
the CCE (in the BackFill path in object_cc_map code handling cce and payload_),
capture the previous payload status (e.g., prev_status = cce->PayloadStatus() or
inspect cce->payload_.cur_payload_), then after applying
DeserializeCurrentPayload/SetCommitTsPayloadStatus compute the transition
(prev_status -> new_status) and adjust
TemplateCcMap<KeyT,ValueT,false,false>::normal_obj_sz_ accordingly (increment
only on Deleted->Normal, decrement on Normal->Deleted, no-op otherwise),
ensuring to use the same cce, commit_ts, and commit_status branches so counts
remain consistent.
🧹 Nitpick comments (2)
store_handler/eloq_data_store_service/data_store.h (1)

118-129: ⚡ Quick win

Document callback-vs-return contract for ReopenPartition no-op path

  • store_handler/eloq_data_store_service/data_store.h default calls callback() and returns false.
  • The current caller (tx_service/src/cc/cc_shard.cpp::CcShard::RequestPartitionReopen) ignores the return value and relies on the callback; EloqStoreDataStore::OnReopenPartition moves and invokes the callback once.
  • The differing behavior from tx_service/include/store/data_store_handler.h default (returns Error without invoking the callback) should be captured in the interface contract so callers implement only one completion mechanism.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@store_handler/eloq_data_store_service/data_store.h` around lines 118 - 129,
The default in DataStore::ReopenPartition currently calls callback() and returns
false, which conflicts with the other interface (store/data_store_handler.h) and
caller patterns (CcShard::RequestPartitionReopen expects the callback;
EloqStoreDataStore::OnReopenPartition moves+invokes the callback once). Update
the interface contract: document in the ReopenPartition comment that
implementations must choose one completion mechanism—either (A) handle
synchronously and return false without invoking the callback, or (B) handle
asynchronously, return true to indicate they will invoke the provided callback
exactly once—and adjust the default DataStore::ReopenPartition implementation to
NOT call callback and return false to match the error-return default in
data_store_handler.h; ensure callers (CcShard::RequestPartitionReopen,
EloqStoreDataStore::OnReopenPartition) follow the documented contract.
store_handler/eloq_data_store_service/eloq_store_data_store.cpp (1)

1082-1087: 💤 Low value

Unused parameter is_hash_partitioned.

This parameter is declared but never used. Consider adding [[maybe_unused]] or casting to void to suppress warnings and document intent.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@store_handler/eloq_data_store_service/eloq_store_data_store.cpp` around lines
1082 - 1087, The parameter is_hash_partitioned in the
EloqStoreDataStore::ReopenPartition signature is unused and triggers warnings;
to fix, mark it explicitly unused by either adding the [[maybe_unused]]
attribute to the parameter in the function declaration/definition or add a
single-line cast (void)is_hash_partitioned; at the start of the
EloqStoreDataStore::ReopenPartition body to document intent and suppress
warnings while leaving behavior unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@store_handler/eloq_data_store_service/eloq_store_data_store.cpp`:
- Around line 1122-1133: The local callback is moved into op via
op->Reset(std::move(callback)) so the function-scoped callback is empty when
ExecAsyn fails; ensure you preserve and invoke the original callback on async
submission failure by saving a copy before the move (e.g., auto saved_cb =
callback) or by changing the call order so you call op->Reset(...) with a copy,
then on ExecAsyn failure call saved_cb() (or invoke the saved std::function)
before op->Clear(); touch the ExecAsyn call site, op->Reset, and the failure
branch that currently checks callback to guarantee the callback is invoked.

In `@tx_service/include/cc/cc_shard.h`:
- Around line 1295-1305: The map pending_partition_reopens_ uses std::string
keys which loses full TableName identity and can cause cross-table aliasing;
change its key type to the project's TableName type (the same key used by
CcShard for other table-scoped state) so reopen state is isolated per TableName,
update the declaration of pending_partition_reopens_ and any places that
insert/lookup to use TableName instead of std::string, and ensure
PartitionReopenState and its members (in_flight, cces) remain unchanged while
compilation adjustments are made where the map is referenced.

In `@tx_service/src/cc/cc_shard.cpp`:
- Around line 2090-2114: The code erases the pending_partition_reopens_ entry
before calling FetchRecord, so if FetchRecord returns Retry the reopen is lost
and some CCE pins remain; fix by deferring removal of the
pending_partition_reopens_ entry until after processing cces (do not call
it->second.erase(part_it) / ccs.pending_partition_reopens_.erase(it) up-front),
iterate the cces and call FetchRecord for each, always call
cce_ptr->GetKeyGapLockAndExtraData()->ReleasePin() to balance the earlier pin
regardless of FetchRecord result, and if any FetchRecord returns Retry
re-enqueue or keep the pending_partition_reopens_ entry (only remove it when all
FetchRecord calls complete without Retry) so buffered commands are retried
correctly.

---

Outside diff comments:
In `@tx_service/include/cc/object_cc_map.h`:
- Around line 2650-2775: The BackFill block (where cce->SetCommitTsPayloadStatus
is called) can double-count objects because TemplateCcMap::normal_obj_sz_ is
incremented whenever cce->PayloadStatus() == RecordStatus::Normal after the
overwrite, but it never decrements if the prior payload was Normal and becomes
Deleted; before mutating the CCE (in the BackFill path in object_cc_map code
handling cce and payload_), capture the previous payload status (e.g.,
prev_status = cce->PayloadStatus() or inspect cce->payload_.cur_payload_), then
after applying DeserializeCurrentPayload/SetCommitTsPayloadStatus compute the
transition (prev_status -> new_status) and adjust
TemplateCcMap<KeyT,ValueT,false,false>::normal_obj_sz_ accordingly (increment
only on Deleted->Normal, decrement on Normal->Deleted, no-op otherwise),
ensuring to use the same cce, commit_ts, and commit_status branches so counts
remain consistent.

---

Nitpick comments:
In `@store_handler/eloq_data_store_service/data_store.h`:
- Around line 118-129: The default in DataStore::ReopenPartition currently calls
callback() and returns false, which conflicts with the other interface
(store/data_store_handler.h) and caller patterns
(CcShard::RequestPartitionReopen expects the callback;
EloqStoreDataStore::OnReopenPartition moves+invokes the callback once). Update
the interface contract: document in the ReopenPartition comment that
implementations must choose one completion mechanism—either (A) handle
synchronously and return false without invoking the callback, or (B) handle
asynchronously, return true to indicate they will invoke the provided callback
exactly once—and adjust the default DataStore::ReopenPartition implementation to
NOT call callback and return false to match the error-return default in
data_store_handler.h; ensure callers (CcShard::RequestPartitionReopen,
EloqStoreDataStore::OnReopenPartition) follow the documented contract.

In `@store_handler/eloq_data_store_service/eloq_store_data_store.cpp`:
- Around line 1082-1087: The parameter is_hash_partitioned in the
EloqStoreDataStore::ReopenPartition signature is unused and triggers warnings;
to fix, mark it explicitly unused by either adding the [[maybe_unused]]
attribute to the parameter in the function declaration/definition or add a
single-line cast (void)is_hash_partitioned; at the start of the
EloqStoreDataStore::ReopenPartition body to document intent and suppress
warnings while leaving behavior unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f95c744a-e1b9-4241-b6a7-18c312a47b63

📥 Commits

Reviewing files that changed from the base of the PR and between f25308c and 017e6e0.

📒 Files selected for processing (13)
  • store_handler/data_store_service_client.cpp
  • store_handler/data_store_service_client.h
  • store_handler/eloq_data_store_service/data_store.h
  • store_handler/eloq_data_store_service/data_store_service.cpp
  • store_handler/eloq_data_store_service/data_store_service.h
  • store_handler/eloq_data_store_service/eloq_store_data_store.cpp
  • store_handler/eloq_data_store_service/eloq_store_data_store.h
  • store_handler/eloq_data_store_service/eloqstore
  • tx_service/include/cc/cc_shard.h
  • tx_service/include/cc/object_cc_map.h
  • tx_service/include/store/data_store_handler.h
  • tx_service/src/cc/cc_req_misc.cpp
  • tx_service/src/cc/cc_shard.cpp

Comment thread store_handler/data_store_service_client.cpp Outdated
Comment thread store_handler/eloq_data_store_service/eloq_store_data_store.cpp Outdated
Comment thread tx_service/include/cc/cc_shard.h Outdated
Comment thread tx_service/src/cc/cc_shard.cpp Outdated
@thweetkomputer thweetkomputer force-pushed the feat-committs-gap-trigger-reopen-zc branch 2 times, most recently from e16f770 to 4059f79 Compare June 2, 2026 02:04
@thweetkomputer thweetkomputer requested a review from liangjchen June 2, 2026 02:11
@thweetkomputer thweetkomputer force-pushed the feat-committs-gap-trigger-reopen-zc branch from 6d8fc96 to 19fa12b Compare June 3, 2026 04:32

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
store_handler/data_store_service_client.cpp (1)

689-699: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Route reopen through the owning DSS shard.

shard_id is computed on Lines 686-687, but this path still always calls the local data_store_service_. Unlike Read, DeleteRange, ScanNext, and BatchWriteRecords in this file, there is no IsLocalShard(shard_id) / RPC split here, so a reopen for a remote-owned partition will be sent to the wrong node and never reach the shard that can actually reopen it.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@store_handler/data_store_service_client.cpp` around lines 689 - 699, The
ReopenPartition call is always sent to the local data_store_service_ even when
shard_id is owned remotely; add the same local/remote split used by
Read/BatchWriteRecords/DeleteRange/ScanNext: check IsLocalShard(shard_id) and if
true call data_store_service_->ReopenPartition(..., std::move(callback));
otherwise route the request to the shard owner via the RPC path (e.g. invoke the
existing RPC client used by other methods or a ReopenPartition RPC that forwards
the same arguments and bridges the callback), ensuring the shard ownership
resolution (shard_id → owner node) and callback forwarding match the patterns in
Read/BatchWriteRecords to deliver results back to the caller.
store_handler/eloq_data_store_service/eloq_store_data_store.cpp (1)

66-70: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Don't read op after Clear().

ReopenOperationData::Clear() on Lines 66-70 nulls callback_ and frees the pooled object. The failure branch then reads op->callback_ after op->Clear(), so the callback is lost and Line 1130 becomes a use-after-free. That can leave the reopen completion path stuck when ExecAsyn() submission fails.

🐛 Minimal fix
     if (!eloq_store_service_->ExecAsyn(
             &reopen_req, user_data, OnReopenPartition))
     {
         LOG(ERROR) << "ReopenPartition: failed to submit reopen for table "
                    << table_name << " partition " << partition_id;
-        op->Clear();
-        // Note: callback was moved into op by Reset() above.
-        // Access it through op->callback_ to avoid using the
-        // moved-from local.
-        if (op->callback_)
+        auto cb = std::move(op->callback_);
+        op->Clear();
+        if (cb)
         {
-            op->callback_();
-            op->callback_ = nullptr;
+            cb();
         }
         return false;
     }
#!/bin/bash
sed -n '66,70p' store_handler/eloq_data_store_service/eloq_store_data_store.cpp
printf '\n---\n'
sed -n '1123,1134p' store_handler/eloq_data_store_service/eloq_store_data_store.cpp

Also applies to: 1123-1134

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@store_handler/eloq_data_store_service/eloq_store_data_store.cpp` around lines
66 - 70, ReopenOperationData::Clear() currently nulls callback_ and calls
Free(), so callers that inspect op->callback_ after op->Clear() get a
use-after-free; fix the caller that handles ExecAsyn() failure by capturing the
callback before clearing the op (e.g., store op->callback_ into a local
variable), then call op->Clear()/Free(), and finally invoke the saved callback;
alternatively change ReopenOperationData::Clear() to not null callback_ (but
prefer copying the callback in the caller to avoid changing object lifecycle
semantics).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@store_handler/data_store_service_client.cpp`:
- Around line 689-699: The ReopenPartition call is always sent to the local
data_store_service_ even when shard_id is owned remotely; add the same
local/remote split used by Read/BatchWriteRecords/DeleteRange/ScanNext: check
IsLocalShard(shard_id) and if true call
data_store_service_->ReopenPartition(..., std::move(callback)); otherwise route
the request to the shard owner via the RPC path (e.g. invoke the existing RPC
client used by other methods or a ReopenPartition RPC that forwards the same
arguments and bridges the callback), ensuring the shard ownership resolution
(shard_id → owner node) and callback forwarding match the patterns in
Read/BatchWriteRecords to deliver results back to the caller.

In `@store_handler/eloq_data_store_service/eloq_store_data_store.cpp`:
- Around line 66-70: ReopenOperationData::Clear() currently nulls callback_ and
calls Free(), so callers that inspect op->callback_ after op->Clear() get a
use-after-free; fix the caller that handles ExecAsyn() failure by capturing the
callback before clearing the op (e.g., store op->callback_ into a local
variable), then call op->Clear()/Free(), and finally invoke the saved callback;
alternatively change ReopenOperationData::Clear() to not null callback_ (but
prefer copying the callback in the caller to avoid changing object lifecycle
semantics).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 629044a6-4569-4e65-8bcc-deda24b8533e

📥 Commits

Reviewing files that changed from the base of the PR and between 017e6e0 and 19fa12b.

📒 Files selected for processing (13)
  • store_handler/data_store_service_client.cpp
  • store_handler/data_store_service_client.h
  • store_handler/eloq_data_store_service/data_store.h
  • store_handler/eloq_data_store_service/data_store_service.cpp
  • store_handler/eloq_data_store_service/data_store_service.h
  • store_handler/eloq_data_store_service/eloq_store_data_store.cpp
  • store_handler/eloq_data_store_service/eloq_store_data_store.h
  • store_handler/eloq_data_store_service/eloqstore
  • tx_service/include/cc/cc_shard.h
  • tx_service/include/cc/object_cc_map.h
  • tx_service/include/store/data_store_handler.h
  • tx_service/src/cc/cc_req_misc.cpp
  • tx_service/src/cc/cc_shard.cpp
🚧 Files skipped from review as they are similar to previous changes (11)
  • store_handler/eloq_data_store_service/eloqstore
  • store_handler/eloq_data_store_service/data_store.h
  • tx_service/include/store/data_store_handler.h
  • store_handler/eloq_data_store_service/eloq_store_data_store.h
  • store_handler/eloq_data_store_service/data_store_service.h
  • tx_service/src/cc/cc_req_misc.cpp
  • store_handler/eloq_data_store_service/data_store_service.cpp
  • tx_service/include/cc/cc_shard.h
  • store_handler/data_store_service_client.h
  • tx_service/src/cc/cc_shard.cpp
  • tx_service/include/cc/object_cc_map.h

@thweetkomputer thweetkomputer force-pushed the feat-committs-gap-trigger-reopen-zc branch from 19fa12b to 6d248a8 Compare June 10, 2026 06:58

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@store_handler/eloq_data_store_service/eloqstore`:
- Line 1: The CI/dev environment is missing the eloqstore submodule commit
cfbace01d6e45655b48dc540fe02b1ce79a38137 so we cannot validate that
::eloqstore::ReopenRequest, the delayed reopen priority-heap queue, and
ExecAsyn/KvRequest compatibility exist; fetch and initialize the submodule at
that exact SHA (git submodule update --init --recursive and fetch the missing
commit or add the commit to the submodule remote), then re-run the build/tests
and re-inspect the eloqstore API to confirm ReopenRequest and the delayed reopen
queue are present and that EloqStoreDataStore::ReopenPartition still compiles
against KvRequest/ExecAsyn without API breaks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b3f8f556-2762-4270-8f68-3c4c07dd2e2a

📥 Commits

Reviewing files that changed from the base of the PR and between 19fa12b and 6d248a8.

📒 Files selected for processing (13)
  • store_handler/data_store_service_client.cpp
  • store_handler/data_store_service_client.h
  • store_handler/eloq_data_store_service/data_store.h
  • store_handler/eloq_data_store_service/data_store_service.cpp
  • store_handler/eloq_data_store_service/data_store_service.h
  • store_handler/eloq_data_store_service/eloq_store_data_store.cpp
  • store_handler/eloq_data_store_service/eloq_store_data_store.h
  • store_handler/eloq_data_store_service/eloqstore
  • tx_service/include/cc/cc_shard.h
  • tx_service/include/cc/object_cc_map.h
  • tx_service/include/store/data_store_handler.h
  • tx_service/src/cc/cc_req_misc.cpp
  • tx_service/src/cc/cc_shard.cpp
🚧 Files skipped from review as they are similar to previous changes (12)
  • store_handler/eloq_data_store_service/data_store_service.h
  • tx_service/include/cc/cc_shard.h
  • store_handler/eloq_data_store_service/eloq_store_data_store.h
  • tx_service/src/cc/cc_req_misc.cpp
  • store_handler/data_store_service_client.cpp
  • store_handler/eloq_data_store_service/data_store.h
  • store_handler/eloq_data_store_service/data_store_service.cpp
  • store_handler/eloq_data_store_service/eloq_store_data_store.cpp
  • tx_service/src/cc/cc_shard.cpp
  • tx_service/include/cc/object_cc_map.h
  • tx_service/include/store/data_store_handler.h
  • store_handler/data_store_service_client.h

Comment thread store_handler/eloq_data_store_service/eloqstore Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tx_service/src/cc/cc_req_misc.cpp`:
- Around line 893-908: The reopen fetch (ccs.FetchRecord(..., reopen=true)) is
invoked while the current in-flight entry cce_ is still present in
fetch_record_reqs_, causing the call to be coalesced into the existing request;
move the reopen FetchRecord call so it occurs after you remove/erase cce_ from
fetch_record_reqs_ (the code that removes the LruEntry pointer), ensuring
FetchRecord sees no existing entry and will dispatch a new reopen fetch. Keep
the same parameters (table_name_, table_schema_, tx_key_.Clone(), cce_,
cc_ng_id_, cc_ng_term_, requester, partition_id_, fetch flags) but invoke
FetchRecord only after the removal of cce_ from fetch_record_reqs_.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3ed91a71-b5de-41d6-b645-b8abe9b1f5e0

📥 Commits

Reviewing files that changed from the base of the PR and between 6d248a8 and 1615629.

📒 Files selected for processing (15)
  • store_handler/data_store_service_client.cpp
  • store_handler/data_store_service_client.h
  • store_handler/data_store_service_client_closure.h
  • store_handler/eloq_data_store_service/data_store_service.cpp
  • store_handler/eloq_data_store_service/data_store_service.h
  • store_handler/eloq_data_store_service/ds_request.proto
  • store_handler/eloq_data_store_service/eloq_store_data_store.cpp
  • store_handler/eloq_data_store_service/eloq_store_data_store.h
  • store_handler/eloq_data_store_service/eloqstore
  • store_handler/eloq_data_store_service/internal_request.h
  • tx_service/include/cc/cc_req_misc.h
  • tx_service/include/cc/cc_shard.h
  • tx_service/include/cc/object_cc_map.h
  • tx_service/src/cc/cc_req_misc.cpp
  • tx_service/src/cc/cc_shard.cpp
💤 Files with no reviewable changes (1)
  • store_handler/eloq_data_store_service/eloq_store_data_store.h
✅ Files skipped from review due to trivial changes (1)
  • store_handler/eloq_data_store_service/eloqstore
🚧 Files skipped from review as they are similar to previous changes (1)
  • tx_service/include/cc/object_cc_map.h

Comment thread tx_service/src/cc/cc_req_misc.cpp Outdated
@thweetkomputer thweetkomputer force-pushed the feat-committs-gap-trigger-reopen-zc branch from 986bc1e to 54509aa Compare June 10, 2026 16:52

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@store_handler/eloq_data_store_service/eloq_store_config.cpp`:
- Around line 131-133: The flag eloq_store_auto_reopen_pending_time_sec is
declared with DEFINE_int32 allowing negative values that later get cast to
uint64_t and wrap; change its declaration to use DEFINE_uint32 so the flag is an
unsigned 32-bit value and negative inputs are rejected—update the
DEFINE_int32(...) for eloq_store_auto_reopen_pending_time_sec to
DEFINE_uint32(...) in eloq_store_config.cpp to match the other duration/count
flags and prevent incorrect microsecond computations where this value is cast
and multiplied.

In `@tx_service/src/cc/cc_req_misc.cpp`:
- Around line 911-924: The code computes should_reopen by calling
cce_->HasBufferedCommandList() before verifying the CC entry is valid; move the
evaluation and any use of cce_ (including should_reopen and initialization of
reopen_cce/reopen_key/reopen_table_name/reopen_table_schema/reopen_cc_ng_id/reopen_cc_ng_term/reopen_partition_id)
inside the validity guard that checks lock_->GetCcEntry() != nullptr (the block
that ensures the CC entry is valid) so cce_ is never dereferenced when the entry
may be invalidated; keep the reopen_* variables but only populate them (and call
tx_key_.Clone()) within that guarded block and default/unset them outside it.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7c3fb417-af00-42b6-9245-3c2c7c36bd3e

📥 Commits

Reviewing files that changed from the base of the PR and between 1615629 and 986bc1e.

📒 Files selected for processing (4)
  • store_handler/eloq_data_store_service/eloq_store_config.cpp
  • store_handler/eloq_data_store_service/eloqstore
  • tx_service/include/cc/object_cc_map.h
  • tx_service/src/cc/cc_req_misc.cpp
✅ Files skipped from review due to trivial changes (1)
  • store_handler/eloq_data_store_service/eloqstore

Comment thread store_handler/eloq_data_store_service/eloq_store_config.cpp Outdated
Comment thread tx_service/src/cc/cc_req_misc.cpp
@thweetkomputer thweetkomputer force-pushed the feat-committs-gap-trigger-reopen-zc branch from 2ee34ac to 332fb58 Compare June 10, 2026 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant