fix(hardware): MemMap/AxiMemMap/AxiStreamDma audit hardening#1200
Conversation
There was a problem hiding this comment.
Pull request overview
This PR hardens Rogue’s Linux hardware bridge layer (MemMap, AxiMemMap, AxiStreamDma) by addressing thread ownership, fd-set bounds safety, and shared-state concurrency, and adds C++ doctest regressions that detect these bug classes via source audits.
Changes:
- Replaces raw
new std::threadownership withstd::unique_ptr<std::thread>inMemMap,AxiMemMap, andAxiStreamDma. - Adds
FD_SETSIZEvalidation (including an early ctor reject forAxiStreamDma) and updatesrunThread()to log-and-exit on invalidfd_. - Protects
AxiStreamDma’s static shared buffer map/state with a new mutex, and adds doctest “audit repro” tests + CMake wiring.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/cpp/protocols/CMakeLists.txt | Adds the new hardware C++ test subdirectory to the protocols test suite. |
| tests/cpp/protocols/hardware/CMakeLists.txt | Registers the new hardware audit regression tests and sets ROGUE_SRC_DIR for source scanning. |
| tests/cpp/protocols/hardware/test_memmap_raw_thread_audit_repro.cpp | Doctest audit ensuring MemMap.cpp avoids raw new std::thread. |
| tests/cpp/protocols/hardware/test_axi_memmap_raw_thread_audit_repro.cpp | Doctest audit ensuring AxiMemMap.cpp avoids raw new std::thread. |
| tests/cpp/protocols/hardware/test_axi_stream_dma_raw_thread_audit_repro.cpp | Doctest audit ensuring AxiStreamDma.cpp avoids raw new std::thread. |
| tests/cpp/protocols/hardware/test_axi_stream_dma_fd_setsize_audit_repro.cpp | Doctest audit ensuring FD_SET(fd_, ...) sites are preceded by FD_SETSIZE guards. |
| tests/cpp/protocols/hardware/test_axi_stream_dma_shared_static_race_audit_repro.cpp | Doctest audit ensuring shared static map accesses are mutex-protected in openShared/closeShared. |
| tests/cpp/protocols/hardware/test_axi_stream_dma_disabled_block_audit_repro.cpp | Doctest audit ensuring the unexplained #if 0 disabled block doesn’t return. |
| src/rogue/hardware/MemMap.cpp | Migrates worker thread creation/deletion to std::unique_ptr + reset on stop. |
| include/rogue/hardware/MemMap.h | Changes thread_ member to std::unique_ptr<std::thread>. |
| src/rogue/hardware/axi/AxiMemMap.cpp | Migrates worker thread creation/deletion to std::unique_ptr + reset on stop. |
| include/rogue/hardware/axi/AxiMemMap.h | Changes thread_ member to std::unique_ptr<std::thread>. |
| src/rogue/hardware/axi/AxiStreamDma.cpp | Adds shared mutex + locking, FD_SETSIZE checks, removes dead #if 0 block, converts thread ownership to unique_ptr, and changes runThread() bad-fd behavior to log-and-exit. |
| include/rogue/hardware/axi/AxiStreamDma.h | Changes thread_ member to std::unique_ptr<std::thread>. |
Comments suppressed due to low confidence (1)
src/rogue/hardware/axi/AxiStreamDma.cpp:287
stop()only performsjoin()/cleanup whenthreadEn_is true. After the newrunThread()path logs an out-of-rangefd_and setsthreadEn_ = false,stop()will skipjoin(), skip closingfd_/desc_, and thestd::threadwill remain joinable (destroyingthread_later can callstd::terminate). Changestop()to join/reset based onthread_ && thread_->joinable()(and always run the fd_/desc_ cleanup as appropriate), not onthreadEn_.
void rha::AxiStreamDma::stop() {
if (threadEn_) {
rogue::GilRelease noGil;
// Stop read thread
threadEn_ = false;
thread_->join();
thread_.reset();
closeShared(desc_);
::close(fd_);
fd_ = -1;
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- AxiStreamDma.cpp: add explicit #include <mutex> alongside other standard headers; sharedBuffersMtx is std::mutex and should not rely on transitive includes - test_axi_stream_dma_raw_thread_audit_repro.cpp: complete the truncated header sentence; reference hardware::MemMap and hardware::axi::AxiMemMap as the sibling raw-ptr cases and note unique_ptr<std::thread> as the correct fix - test_axi_memmap_raw_thread_audit_repro.cpp: complete the truncated header sentence; reference hardware::MemMap and the unique_ptr<std::thread> fix - test_axi_stream_dma_fd_setsize_audit_repro.cpp: rewrite the CHECK failure message; the previous wording prescribed 'throw' for every FD_SET site, but runThread() now logs and exits the worker cleanly (acceptReq/acceptFrame still throw)
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (2)
src/rogue/hardware/axi/AxiStreamDma.cpp:90
openShared()only incrementsopenCountin theret->fd != -1fast-path. If the descriptor exists insharedBuffers_butfd == -1(e.g., reopened after last close) or ifzCopyEnis false (early-return path),openCountis not incremented for the new user. That makesopenCountdrift/underflow once multiple instances are constructed (especially withzeroCopyDisable()), which can break thecloseShared()lifetime logic. IncrementopenCountfor every successfulopenShared()call that returns a descriptor (including the!zCopyEnpath), and avoid special-casing it onfdbeing open.
rha::AxiStreamDmaSharedPtr rha::AxiStreamDma::openShared(std::string path, rogue::LoggingPtr log) {
rogue::GilRelease noGil;
std::lock_guard<std::mutex> lock(sharedBuffersMtx);
std::map<std::string, rha::AxiStreamDmaSharedPtr>::iterator it;
rha::AxiStreamDmaSharedPtr ret;
// Entry already exists
if ((it = sharedBuffers_.find(path)) != sharedBuffers_.end()) {
ret = it->second;
log->debug("Reusing existing shared file descriptor for %s", path.c_str());
// Create new record
} else {
ret = std::make_shared<rha::AxiStreamDmaShared>(path);
log->debug("Opening new shared file descriptor for %s", path.c_str());
}
// Check if already open
if (ret->fd != -1) {
ret->openCount++;
log->debug("Shared file descriptor already opened for %s", path.c_str());
return ret;
}
// Check if zero copy is disabled, if so don't open or map buffers
if (!ret->zCopyEn) {
log->debug("Zero copy is disabled. Not Mapping Buffers for %s", path.c_str());
return ret;
}
src/rogue/hardware/axi/AxiStreamDma.cpp:153
closeShared()unconditionally calls::close(desc->fd)whenopenCountreaches 0. In thezeroCopyDisable()flow,desc->fdis never opened (stays-1), so this will callclose(-1)(EBADF). Guard the close withif (desc->fd != -1)(and similarly only unmap when a valid mapping exists).
void rha::AxiStreamDma::closeShared(rha::AxiStreamDmaSharedPtr desc) {
rogue::GilRelease noGil;
std::lock_guard<std::mutex> lock(sharedBuffersMtx);
desc->openCount--;
if (desc->openCount == 0) {
if (desc->rawBuff != NULL) {
dmaUnMapDma(desc->fd, desc->rawBuff);
}
::close(desc->fd);
desc->fd = -1;
desc->bCount = 0;
desc->bSize = 0;
desc->rawBuff = NULL;
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Address PR #1200 Copilot review: the helper was defined but never referenced by any TEST_CASE in this translation unit, leaving dead code that can trigger -Wunused-function warnings under stricter builds.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Three Copilot comments on PR #1200 flagged residual bugs in the AxiStreamDma audit fixes: * AxiStreamDmaShared ctor initialized openCount = 1, double-counting entries seeded by zeroCopyDisable() before any AxiStreamDma user existed. * openShared() only incremented openCount on the fd != -1 branch, so AxiStreamDma instances on a zeroCopyDisable()-d path bypassed the increment while still triggering closeShared() decrements -> openCount drifted negative. * closeShared() unconditionally called ::close(desc->fd); in the zeroCopyDisable() flow desc->fd is never opened, so ::close(-1) was issued every time openCount reached 0 (returns EBADF). * stop() gated all cleanup behind 'if (threadEn_)'. Now that runThread() can exit on its own by setting threadEn_ = false (in-loop fd_ guard), stop() would skip thread_->join(), and ~unique_ptr<std::thread> on a still-joinable thread would call std::terminate(). Fixes: * AxiStreamDmaShared ctor initializes openCount = 0. * openShared() increments openCount on every successful return path (existing-open, zero-copy-disabled, and freshly opened branches), with the new-entry increment placed after open + dmaMapDma succeed so a pre-insertion throw does not leak openCount. * closeShared() guards ::close(desc->fd) with 'if (desc->fd != -1)'. * stop() drives cleanup from per-resource state: thread_ joinability for the join, desc_ presence for closeShared (with reset to prevent double-close), fd_ >= 0 for the ::close. threadEn_ = false is set unconditionally so the worker still receives the stop signal, and the function remains idempotent across user-stop -> dtor sequences and worker self-exit -> stop sequences. Adds two source-audit doctest regressions matching the existing repro pattern: * test_axi_stream_dma_open_count_audit_repro.cpp - asserts the ctor initializes openCount = 0, openShared has >= 3 openCount++ sites (one per return branch), and closeShared guards every ::close(desc->fd) with an 'if (desc->fd != -1)' check (comment-stripped scan to avoid false positives from explanatory comments). * test_axi_stream_dma_stop_joinable_audit_repro.cpp - asserts stop() does not gate cleanup behind 'if (threadEn_)' and references joinable() so it joins on both the normal-stop and worker self-exit paths. ctest --test-dir build - 19/19 pass (including the two new audits). scripts/run_linters.sh - clean.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
closeShared() leaves entries in sharedBuffers_ after openCount drops to 0; only fd/bCount/bSize/rawBuff are reset. An entry can therefore be re-entered through the "fd == -1, zCopyEn == true" fall-through in openShared(). If ::open succeeds but a later step (driver version checks, dmaCheckVersion, dmaMapDma) fails, the original code called ::close(ret->fd) and threw without resetting ret->fd back to -1, leaving the persistent shared-buffer record holding a non-negative but closed descriptor. The next openShared() caller would hit the "ret->fd != -1" fast-path, increment openCount, and reuse the stale fd - every operation against it would return EBADF. Reset ret->fd to -1 on each error path before throwing, and scrub bCount/bSize/rawBuff in the dmaMapDma failure path so the record is left in a clean state for the next caller. Add a structural regression test that walks every ::close(ret->fd) site in openShared() and requires a 'ret->fd = -1' between the close and its throw. Addresses PR #1200 Copilot review comment 3189767492.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
src/rogue/hardware/axi/AxiStreamDma.cpp:125
- Typo in this user-facing error message: there is a doubled comma in “API),,”. Remove the extra comma to avoid shipping a malformed message.
Note that aes-stream-driver (v5.15.2 or earlier) and rogue (v5.11.1 or earlier) are compatible with the 32-bit address API.
To use later versions (64-bit address API),, you will need to upgrade both rogue and aes-stream-driver at the same time to:
\t\taes-stream-driver = v5.16.0 (or later)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ed() Address PR #1200 Copilot review comment 3189891044: the comment near the openShared() error-path block said "Reset fd_ to -1", but the code resets the shared record field ret->fd, not the AxiStreamDma::fd_ per-instance member. Reword to call out ret->fd explicitly and add a one-line note distinguishing it from AxiStreamDma::fd_. Comment-only change; no behavior impact.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Hardens the Linux hardware bridge layer (MemMap, AxiMemMap,
AxiStreamDma) against thread-lifetime, file-descriptor, and
shared-state bugs surfaced by the bug audit. Public API signatures
are unchanged; fixes live in implementation and internal members.
Each bug class is locked down by a source-audit doctest regression.
Thread ownership (RAII)
- Replace raw new std::thread + delete with std::unique_ptr<std::thread>
in MemMap, AxiMemMap, AxiStreamDma. Eliminates leaks on
construction-failure paths and ensures cleanup ordering.
FD_SETSIZE bounds checks (AxiStreamDma)
- Reject fd_ >= FD_SETSIZE synchronously in the ctor (close + throw
GeneralError) before the worker thread starts.
- In-loop FD_SETSIZE guards in acceptReq()/acceptFrame() include the
numeric limit and identify likely root causes (post-stop instance
vs process fd-table exhaustion).
- runThread() logs and exits the worker cleanly on out-of-range fd
instead of throwing across the thread boundary (which would call
std::terminate()).
Shared-buffer state race (AxiStreamDma)
- Add static std::mutex sharedBuffersMtx covering openShared,
closeShared, zeroCopyDisable, and the ctor-time reads/writes of
desc_->bCount / desc_->bSize. Add explicit #include <mutex>.
Drop dead #if 0 block.
openShared() error-path stale-fd leak (AxiStreamDma)
- closeShared() leaves entries in sharedBuffers_ after openCount drops
to 0 (only fd/bCount/bSize/rawBuff are reset). On error paths after
::open succeeded but a later step (driver version, dmaCheckVersion,
dmaMapDma) failed, the original code closed the fd but left ret->fd
non-negative; the next caller would re-enter the persistent record
and reuse a stale descriptor, returning EBADF on every op.
- Reset ret->fd = -1 (and scrub bCount/bSize/rawBuff in the dmaMapDma
failure path) on every error path before throwing. Comments
disambiguate ret->fd (shared record field) from AxiStreamDma::fd_
(per-instance member).
openCount accounting (AxiStreamDma)
- AxiStreamDmaShared ctor initializes openCount = 0 (was 1, double-
counting entries seeded by zeroCopyDisable() before any AxiStreamDma
user existed).
- openShared() increments openCount on every successful return path
(existing-open, zero-copy-disabled fall-through, and freshly opened
branch). The new-entry increment is placed after open + dmaMapDma
succeed so a pre-insertion throw cannot leak openCount.
- closeShared() guards ::close(desc->fd) with if (desc->fd != -1) -
avoids close(-1) returning EBADF every time the count reaches 0 in
the zeroCopyDisable() flow.
stop() idempotence + worker self-exit (AxiStreamDma)
- Now that runThread() can self-exit via the in-loop fd guard
(sets threadEn_ = false and returns), the previous if (threadEn_)-
gated cleanup would skip thread_->join() and ~unique_ptr<std::thread>
on a still-joinable thread would call std::terminate().
- Drive cleanup from per-resource state: thread_->joinable() for the
join, desc_ presence for closeShared (with desc_.reset() to prevent
double-close), fd_ >= 0 for the ::close. threadEn_ = false is set
unconditionally so the worker still receives the stop signal.
Function is idempotent across user-stop -> dtor and worker self-exit
-> stop sequences.
Use-after-stop guards (AxiStreamDma::acceptReq / acceptFrame)
- stop() resets desc_ to nullptr and closes fd_; previously
acceptReq() dereferenced desc_->bSize at function entry and
segfaulted on a post-stop call, and acceptFrame() only checked the
fd inside the inner write loop after frame->lock() and buffer
iteration had already run.
- Add fail-fast !desc_ || fd_ < 0 entry guards in both methods that
throw GeneralError("...stopped or did not finish construction").
Source-audit regression tests (tests/cpp/protocols/hardware/)
Each test scans the source for the audited invariant
(comment-stripped to avoid false positives):
- test_memmap_raw_thread_audit_repro.cpp
- test_axi_memmap_raw_thread_audit_repro.cpp
- test_axi_stream_dma_raw_thread_audit_repro.cpp
- test_axi_stream_dma_fd_setsize_audit_repro.cpp
- test_axi_stream_dma_disabled_block_audit_repro.cpp
- test_axi_stream_dma_shared_static_race_audit_repro.cpp
- test_axi_stream_dma_open_shared_stale_fd_audit_repro.cpp
- test_axi_stream_dma_open_count_audit_repro.cpp
- test_axi_stream_dma_stop_joinable_audit_repro.cpp
- test_axi_stream_dma_use_after_stop_audit_repro.cpp
0c20b2f to
d444ac9
Compare
|
These changes seem reasonable. |
Summary
Hardens the Linux hardware bridge layer (
MemMap,AxiMemMap,AxiStreamDma) against thread-lifetime, file-descriptor, and shared-state bugs surfaced by the bug audit. All public API signatures are unchanged; fixes live in implementation + private/internal members. Each bug class is locked down by a source-audit doctest regression so the audited invariant cannot silently regress.Bugs fixed
Thread ownership (RAII)
std::thread* thread_ = new std::thread(...)+deletewithstd::unique_ptr<std::thread>inMemMap,AxiMemMap,AxiStreamDma. Eliminates leaks on construction-failure paths and ensures cleanup ordering.FD_SETSIZE bounds checks (
AxiStreamDma)fd_ >= FD_SETSIZEsynchronously in the ctor (close + throwGeneralError), before the worker thread starts.FD_SETSIZEguards inacceptReq()/acceptFrame()now include the numeric limit and identify likely root causes (post-stop instance vs process fd-table exhaustion).runThread()logs and exits the worker cleanly on out-of-range fd instead of throwing across the thread boundary (which would callstd::terminate()).Shared-buffer state race (
AxiStreamDma)static std::mutex sharedBuffersMtxcoveringopenShared/closeShared/zeroCopyDisableand the ctor-time reads/writes ofdesc_->bCount/desc_->bSize. Also adds explicit#include <mutex>.#if 0block.openShared() error-path stale-fd leak (
AxiStreamDma)closeShared()leaves entries insharedBuffers_afteropenCountdrops to 0 (only fd/bCount/bSize/rawBuff are reset). On error paths after::opensucceeded but a later step (driver version,dmaCheckVersion,dmaMapDma) failed, the original code closed the fd but leftret->fdnon-negative — the next caller would re-enter the persistent record and reuse a stale descriptor, returning EBADF on every op.ret->fd = -1(and scrubbCount/bSize/rawBuffin thedmaMapDmafailure path) on every error path before throwing. Comments disambiguateret->fd(shared record field) fromAxiStreamDma::fd_(per-instance member).openCount accounting (
AxiStreamDma)AxiStreamDmaSharedctor now initializesopenCount = 0(was 1, double-counting entries seeded byzeroCopyDisable()before anyAxiStreamDmauser existed).openShared()incrementsopenCounton every successful return path (existing-open, zero-copy-disabled fall-through, and freshly opened branch). The new-entry increment is placed afteropen+dmaMapDmasucceed, so a pre-insertion throw cannot leakopenCount.closeShared()guards::close(desc->fd)withif (desc->fd != -1)— avoidsclose(-1)returning EBADF every time the count reaches 0 in thezeroCopyDisable()flow.stop() idempotence + worker self-exit (
AxiStreamDma)runThread()can self-exit via the in-loop fd guard (setsthreadEn_ = falseand returns), the previousif (threadEn_)-gated cleanup would skipthread_->join()and~unique_ptr<std::thread>on a still-joinable thread would callstd::terminate().thread_->joinable()for the join,desc_presence forcloseShared(withdesc_.reset()to prevent double-close),fd_ >= 0for the::close.threadEn_ = falseis set unconditionally so the worker still receives the stop signal. Function is idempotent across user-stop → dtor and worker self-exit → stop sequences.Use-after-stop guards (
AxiStreamDma::acceptReq/acceptFrame)stop()resetsdesc_tonullptrand closesfd_; previouslyacceptReq()dereferenceddesc_->bSizeat function entry and segfaulted on a post-stop call, andacceptFrame()only checked the fd inside the inner write loop afterframe->lock()and buffer iteration had already run.!desc_ || fd_ < 0entry guards in both methods that throwGeneralError("...stopped or did not finish construction").Source-audit regression tests added (
tests/cpp/protocols/hardware/)Each test scans the source for the audited invariant (comment-stripped to avoid false positives from explanatory text):
test_memmap_raw_thread_audit_repro.cpptest_axi_memmap_raw_thread_audit_repro.cpptest_axi_stream_dma_raw_thread_audit_repro.cpptest_axi_stream_dma_fd_setsize_audit_repro.cpptest_axi_stream_dma_disabled_block_audit_repro.cpptest_axi_stream_dma_shared_static_race_audit_repro.cpptest_axi_stream_dma_open_shared_stale_fd_audit_repro.cpptest_axi_stream_dma_open_count_audit_repro.cpptest_axi_stream_dma_stop_joinable_audit_repro.cpptest_axi_stream_dma_use_after_stop_audit_repro.cppDocumentation
No
.rstupdates required: the affected API pages (docs/src/api/cpp/hardware/...,docs/src/api/python/rogue/hardware/...) all usedoxygenclass/doxygentypedefautogeneration. All header changes are inside//! \cond INTERNAL ... \endcondblocks (excluded by Doxygen) and no public API signatures changed.Verification
./scripts/run_linters.sh— clean.cmake -DROGUE_BUILD_TESTS=ON -B build && cmake --build build && ctest --test-dir build— 19/19 pass (including all 10 new audit repros).pytest tests/.