Add dynamic RX flow lifecycle#123
Conversation
a3820de to
495037f
Compare
Signed-off-by: Cliff Burdick <cburdick@nvidia.com>
495037f to
48c3070
Compare
dleshchev
left a comment
There was a problem hiding this comment.
Reviewed the dynamic RX flow lifecycle. Solid, well-structured PR (impl + bindings + docs + runnable example) with nice cleanup (shared ensure_rx_flow_jump_rule, default-initialized flow structs, optional rx.flows). Inline notes below.
Blockers: (1) static flow id: 0 is now rejected, breaking 14 shipped configs at daqiri_init(); (2) rte_flow_configure is applied to every port unconditionally — recommend opt-in + a static+dynamic HW test; (3) template flows are torn down with the wrong API on shutdown.
Medium: don't forget HW flow handles on failed/unconfirmed destroys; recycle FlowIds.
Minor: flow-isolation docs disagree (drop vs. kernel); the docs poll_flow_op example busy-spins with no sleep (the example app sleeps 10 ms — mirror that); and the commit title should follow #<issue> - <title> per CONTRIBUTING.
Design is sound — nothing structural.
| static_flow_ids_.clear(); | ||
| for (const auto& intf : cfg_.ifs_) { | ||
| for (const auto& flow : intf.rx_.flows_) { | ||
| if (flow.id_ == 0) { |
There was a problem hiding this comment.
Breaking regression. Rejecting static flow id: 0 aborts daqiri_init() for 14 shipped configs that use it — incl. the canonical daqiri_bench_raw_tx_rx.yaml:50, every reorder config (daqiri_bench_raw_tx_rx_reorder_seq_1024.yaml:61, …), HDS, 4q, spark, gds_write, pcap_writer, and the sw_loopback_reorder smoke test. reserve_static_flow_ids() runs early in initialize(), so these fail before EAL/queue setup. Either keep 0 valid for static flows (reserve 0 only in the dynamic allocator; treat static 0 as no-mark) or migrate every static flows[].id and the matching reorder_configs[].flow_ids and the walkthrough docs in this same PR. As written, existing users break on upgrade.
|
|
||
| rte_eth_macaddr_get(intf.port_id_, &conf_ports_eth_addr[intf.port_id_]); | ||
|
|
||
| configure_flow_api_for_port(intf.port_id_, rx.dynamic_flow_capacity_); |
There was a problem hiding this comment.
configure_flow_api_for_port() runs for every port unconditionally (dynamic_flow_capacity_ defaults to 1024, no off switch), so every existing DPDK config now opts the port into the async template engine via rte_flow_configure() — even static-only configs that never add a dynamic flow. We run mlx5 in HWS (dv_flow_en=2), yet the jump rule, static flows (add_flow), and drop_all_traffic still use synchronous rte_flow_create. Mixing the configured template engine with non-template rte_flow_create on the same mlx5 port is version/PMD-sensitive (group-namespace + post-configure create constraints), and the example is dynamic-only so static+dynamic mixing is untested. Suggest gating behind an explicit opt-in (e.g. dynamic_flow_capacity: 0 ⇒ disabled, default off) so static-only configs are unaffected, and validating a mixed config on CX7.
| while (!ready_flow_ops_.empty()) { ready_flow_ops_.pop(); } | ||
|
|
||
| for (auto& [flow_id, entry] : dynamic_flows_) { | ||
| if (entry.flow != nullptr) { rte_flow_destroy(entry.port, entry.flow, &error); } |
There was a problem hiding this comment.
Destroys every dynamic flow with synchronous rte_flow_destroy, ignoring entry.backend. TEMPLATE-backed flows were created via rte_flow_async_create and must be torn down with rte_flow_async_destroy + rte_flow_push/rte_flow_pull; rte_flow_destroy on a template-table handle isn't supported on mlx5 and will likely fail, which then makes the rte_flow_template_table_destroy below fail (table still owns flows). Branch on entry.backend and drain async destroys before destroying the table. Shutdown-only; bites when ACTIVE template flows exist at exit.
| result.status_ = Status::INTERNAL_ERROR; | ||
| } | ||
|
|
||
| dynamic_flows_.erase(entry.flow_id); |
There was a problem hiding this comment.
Entry is erased even when rte_flow_destroy failed above. The caller gets a failed DELETE completion, but the rule may still be installed in HW and is now untracked — it can't be retried or cleaned up. Keep the entry (ACTIVE) on failure. Same pattern in poll_dpdk_flow_completions_locked (a failed RTE_FLOW_OP_* destroy still erases) and in the template add paths (push failure abandons an already-enqueued rte_flow_async_create handle). Suggest a consistent "only forget on confirmed success" rule.
| return next_flow_op_id_++; | ||
| } | ||
|
|
||
| FlowId DpdkMgr::allocate_dynamic_flow_id() { |
There was a problem hiding this comment.
allocate_dynamic_flow_id() only increments and never reclaims ids freed by delete. The id is written as the HW MARK (mark->id = flow_id, read back via mbuf->hash.fdir.hi), and mlx5 caps the usable FDIR mark well below uint32_t (~24 bits, minus reserved). For an add/delete-churn feature, a long-running control plane climbs past the mark ceiling (and table nb_flows) long before the counter wraps. Recommend recycling ids on delete completion and validating against the HW mark limit at allocation.
| if (!initialized_) { return Status::NOT_READY; } | ||
| if (static_flow_ids_.find(flow_id) != static_flow_ids_.end()) { return Status::INVALID_PARAMETER; } | ||
| const auto flow_it = dynamic_flows_.find(flow_id); | ||
| if (flow_it == dynamic_flows_.end() || flow_it->second.state != DynamicFlowState::ACTIVE) { |
There was a problem hiding this comment.
Delete rejects anything not ACTIVE, but a TEMPLATE flow stays ADDING until its add completion is drained by poll_flow_op. So add-then-immediate-delete (before polling the add) returns INVALID_PARAMETER. The example happens to wait for the add completion first — please document this ordering requirement so callers don't hit a surprising rejection.
|
|
||
| static void build_ipv4_udp_template_pattern(const Ipv4UdpFlowTemplateKey& key, | ||
| struct rte_flow_item pattern[MAX_PATTERN_NUM]) { | ||
| static struct rte_flow_item_ipv4 ip_mask; |
There was a problem hiding this comment.
static mask structs are a needless reentrancy/thread-safety smell — they only work because rte_flow_pattern_template_create deep-copies immediately. Plain locals are clearer and safe.
| @@ -201,11 +205,23 @@ RDMA transport settings: | |||
| ### Flow Isolation | |||
|
|
|||
| `rx.flow_isolation:` — When `true`, only packets matching an explicit flow rule are delivered. | |||
There was a problem hiding this comment.
This says unmatched packets are dropped under flow_isolation: true, but docs/tutorials/configuration-walkthrough.md item 19 says they're delegated back to the Linux kernel (the rte_flow_isolate semantics). This PR edits this passage and adds guidance that leans on the drop behavior, so the two docs should be reconciled to whatever the port actually does on this HW. (No code change in this PR alters the isolation path — rte_flow_isolate(port,1) is unchanged — so this is a doc-consistency fix, not a behavior change.)
|
|
||
| enum class FlowMatchType { | ||
| NORMAL, | ||
| IPV4_UDP, |
There was a problem hiding this comment.
FlowMatchType::NORMAL → IPV4_UDP (and pybind value "NORMAL"→"IPV4_UDP") is a breaking API rename for any external caller referencing the old name. No in-repo refs remain, but please call it out in the PR description / release notes.
No description provided.