Skip to content

Add dynamic RX flow lifecycle#123

Draft
cliffburdick wants to merge 1 commit into
mainfrom
cburdick/dynamic-rx-flow-lifecycle
Draft

Add dynamic RX flow lifecycle#123
cliffburdick wants to merge 1 commit into
mainfrom
cburdick/dynamic-rx-flow-lifecycle

Conversation

@cliffburdick

Copy link
Copy Markdown
Collaborator

No description provided.

@cliffburdick cliffburdick force-pushed the cburdick/dynamic-rx-flow-lifecycle branch 4 times, most recently from a3820de to 495037f Compare June 8, 2026 19:21
Signed-off-by: Cliff Burdick <cburdick@nvidia.com>
@cliffburdick cliffburdick force-pushed the cburdick/dynamic-rx-flow-lifecycle branch from 495037f to 48c3070 Compare June 8, 2026 22:42

@dleshchev dleshchev left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the dynamic RX flow lifecycle. Solid, well-structured PR (impl + bindings + docs + runnable example) with nice cleanup (shared ensure_rx_flow_jump_rule, default-initialized flow structs, optional rx.flows). Inline notes below.

Blockers: (1) static flow id: 0 is now rejected, breaking 14 shipped configs at daqiri_init(); (2) rte_flow_configure is applied to every port unconditionally — recommend opt-in + a static+dynamic HW test; (3) template flows are torn down with the wrong API on shutdown.

Medium: don't forget HW flow handles on failed/unconfirmed destroys; recycle FlowIds.

Minor: flow-isolation docs disagree (drop vs. kernel); the docs poll_flow_op example busy-spins with no sleep (the example app sleeps 10 ms — mirror that); and the commit title should follow #<issue> - <title> per CONTRIBUTING.

Design is sound — nothing structural.

static_flow_ids_.clear();
for (const auto& intf : cfg_.ifs_) {
for (const auto& flow : intf.rx_.flows_) {
if (flow.id_ == 0) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaking regression. Rejecting static flow id: 0 aborts daqiri_init() for 14 shipped configs that use it — incl. the canonical daqiri_bench_raw_tx_rx.yaml:50, every reorder config (daqiri_bench_raw_tx_rx_reorder_seq_1024.yaml:61, …), HDS, 4q, spark, gds_write, pcap_writer, and the sw_loopback_reorder smoke test. reserve_static_flow_ids() runs early in initialize(), so these fail before EAL/queue setup. Either keep 0 valid for static flows (reserve 0 only in the dynamic allocator; treat static 0 as no-mark) or migrate every static flows[].id and the matching reorder_configs[].flow_ids and the walkthrough docs in this same PR. As written, existing users break on upgrade.


rte_eth_macaddr_get(intf.port_id_, &conf_ports_eth_addr[intf.port_id_]);

configure_flow_api_for_port(intf.port_id_, rx.dynamic_flow_capacity_);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

configure_flow_api_for_port() runs for every port unconditionally (dynamic_flow_capacity_ defaults to 1024, no off switch), so every existing DPDK config now opts the port into the async template engine via rte_flow_configure() — even static-only configs that never add a dynamic flow. We run mlx5 in HWS (dv_flow_en=2), yet the jump rule, static flows (add_flow), and drop_all_traffic still use synchronous rte_flow_create. Mixing the configured template engine with non-template rte_flow_create on the same mlx5 port is version/PMD-sensitive (group-namespace + post-configure create constraints), and the example is dynamic-only so static+dynamic mixing is untested. Suggest gating behind an explicit opt-in (e.g. dynamic_flow_capacity: 0 ⇒ disabled, default off) so static-only configs are unaffected, and validating a mixed config on CX7.

while (!ready_flow_ops_.empty()) { ready_flow_ops_.pop(); }

for (auto& [flow_id, entry] : dynamic_flows_) {
if (entry.flow != nullptr) { rte_flow_destroy(entry.port, entry.flow, &error); }

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Destroys every dynamic flow with synchronous rte_flow_destroy, ignoring entry.backend. TEMPLATE-backed flows were created via rte_flow_async_create and must be torn down with rte_flow_async_destroy + rte_flow_push/rte_flow_pull; rte_flow_destroy on a template-table handle isn't supported on mlx5 and will likely fail, which then makes the rte_flow_template_table_destroy below fail (table still owns flows). Branch on entry.backend and drain async destroys before destroying the table. Shutdown-only; bites when ACTIVE template flows exist at exit.

result.status_ = Status::INTERNAL_ERROR;
}

dynamic_flows_.erase(entry.flow_id);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entry is erased even when rte_flow_destroy failed above. The caller gets a failed DELETE completion, but the rule may still be installed in HW and is now untracked — it can't be retried or cleaned up. Keep the entry (ACTIVE) on failure. Same pattern in poll_dpdk_flow_completions_locked (a failed RTE_FLOW_OP_* destroy still erases) and in the template add paths (push failure abandons an already-enqueued rte_flow_async_create handle). Suggest a consistent "only forget on confirmed success" rule.

return next_flow_op_id_++;
}

FlowId DpdkMgr::allocate_dynamic_flow_id() {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allocate_dynamic_flow_id() only increments and never reclaims ids freed by delete. The id is written as the HW MARK (mark->id = flow_id, read back via mbuf->hash.fdir.hi), and mlx5 caps the usable FDIR mark well below uint32_t (~24 bits, minus reserved). For an add/delete-churn feature, a long-running control plane climbs past the mark ceiling (and table nb_flows) long before the counter wraps. Recommend recycling ids on delete completion and validating against the HW mark limit at allocation.

if (!initialized_) { return Status::NOT_READY; }
if (static_flow_ids_.find(flow_id) != static_flow_ids_.end()) { return Status::INVALID_PARAMETER; }
const auto flow_it = dynamic_flows_.find(flow_id);
if (flow_it == dynamic_flows_.end() || flow_it->second.state != DynamicFlowState::ACTIVE) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete rejects anything not ACTIVE, but a TEMPLATE flow stays ADDING until its add completion is drained by poll_flow_op. So add-then-immediate-delete (before polling the add) returns INVALID_PARAMETER. The example happens to wait for the add completion first — please document this ordering requirement so callers don't hit a surprising rejection.


static void build_ipv4_udp_template_pattern(const Ipv4UdpFlowTemplateKey& key,
struct rte_flow_item pattern[MAX_PATTERN_NUM]) {
static struct rte_flow_item_ipv4 ip_mask;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static mask structs are a needless reentrancy/thread-safety smell — they only work because rte_flow_pattern_template_create deep-copies immediately. Plain locals are clearer and safe.

@@ -201,11 +205,23 @@ RDMA transport settings:
### Flow Isolation

`rx.flow_isolation:` — When `true`, only packets matching an explicit flow rule are delivered.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This says unmatched packets are dropped under flow_isolation: true, but docs/tutorials/configuration-walkthrough.md item 19 says they're delegated back to the Linux kernel (the rte_flow_isolate semantics). This PR edits this passage and adds guidance that leans on the drop behavior, so the two docs should be reconciled to whatever the port actually does on this HW. (No code change in this PR alters the isolation path — rte_flow_isolate(port,1) is unchanged — so this is a doc-consistency fix, not a behavior change.)

Comment thread include/daqiri/types.h

enum class FlowMatchType {
NORMAL,
IPV4_UDP,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FlowMatchType::NORMAL → IPV4_UDP (and pybind value "NORMAL"→"IPV4_UDP") is a breaking API rename for any external caller referencing the old name. No in-repo refs remain, but please call it out in the PR description / release notes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants