Skip to content

Conversation

@patrickelectric
Copy link
Member

@patrickelectric patrickelectric commented Dec 8, 2025

Summary by Sourcery

Add an automatic integrity check and recovery step for MCAP recordings before thumbnail extraction.

Enhancements:

  • Introduce an asynchronous helper to run mcap doctor on MCAP files and invoke mcap recover when corruption is detected.
  • Integrate the MCAP health check into the recording extraction flow, skipping recovery gracefully when the mcap CLI or target file is unavailable.

@sourcery-ai
Copy link

sourcery-ai bot commented Dec 8, 2025

Reviewer's Guide

Adds an async MCAP integrity check-and-repair step using the external mcap tool before running video extraction in the recorder_extractor service.

Sequence diagram for MCAP check and recover before video extraction

sequenceDiagram
    participant Extractor as extract_mcap_recordings
    participant Checker as check_and_recover_mcap
    participant OS as asyncio_subprocess
    participant McapBinary as mcap_CLI

    Extractor->>Checker: check_and_recover_mcap(mcap_path)
    Checker->>Checker: shutil.which(mcap)
    alt mcap binary missing
        Checker-->>Extractor: return
    else mcap binary available
        Checker->>Checker: validate mcap_path exists
        alt mcap_path missing
            Checker-->>Extractor: return
        else mcap_path exists
            Checker->>OS: create_subprocess_exec(mcap doctor mcap_path)
            OS-->>Checker: returncode, stdout, stderr
            alt doctor success
                Checker-->>Extractor: return
            else doctor failure
                Checker->>OS: create_subprocess_exec(mcap recover mcap_path -o mcap_path)
                OS-->>Checker: recover_returncode, recover_stdout, recover_stderr
                alt recover success
                    Checker-->>Extractor: return
                else recover failure
                    Checker-->>Extractor: return
                end
            end
        end
    end
    Extractor->>OS: create_subprocess_exec(mcap-foxglove-video-extract mcap_path)
    OS-->>Extractor: extraction result
Loading

Class diagram for updated recorder_extractor main module

classDiagram
    class RecorderExtractorMainModule {
        +async build_thumbnail_bytes(path: Path) bytes
        +async extract_mcap_recordings() None
        +async check_and_recover_mcap(mcap_path: Path) None
    }

    class AsyncioSubprocess {
        +create_subprocess_exec(*cmd) Process
    }

    class Process {
        +async communicate() tuple
        +returncode int
    }

    RecorderExtractorMainModule --> AsyncioSubprocess : uses
    AsyncioSubprocess --> Process : creates
Loading

File-Level Changes

Change Details Files
Introduce an async helper that runs mcap doctor and conditionally mcap recover on MCAP files.
  • Add check_and_recover_mcap coroutine that verifies mcap binary existence and MCAP file presence.
  • Execute mcap doctor via asyncio.create_subprocess_exec and log success or detailed failure output.
  • On doctor failure, invoke mcap recover to repair the file in-place and log success or errors, handling non-zero exit codes gracefully.
core/services/recorder_extractor/main.py
Integrate MCAP repair step into the extraction workflow.
  • Before calling mcap-foxglove-video-extract, await the new check_and_recover_mcap helper for each MCAP file.
  • Ensure the recover step is only run for files that are not currently in use, preserving existing in-use checks.
core/services/recorder_extractor/main.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and found some issues that need to be addressed.

  • When running mcap recover, consider writing to a temporary file and then atomically moving it over the original path to avoid corrupting the only copy if the recover process produces a bad or partial file.
  • You can simplify the subprocess handling by using text=True (or encoding="utf-8", errors="ignore") in create_subprocess_exec instead of manually decoding stdout/stderr bytes in both the doctor and recover calls.
  • The check_and_recover_mcap function currently logs full stdout/stderr for both doctor and recover; consider truncating or summarizing very large outputs to keep logs readable and to avoid log bloat on large or noisy files.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- When running `mcap recover`, consider writing to a temporary file and then atomically moving it over the original path to avoid corrupting the only copy if the recover process produces a bad or partial file.
- You can simplify the subprocess handling by using `text=True` (or `encoding="utf-8"`, `errors="ignore"`) in `create_subprocess_exec` instead of manually decoding `stdout`/`stderr` bytes in both the doctor and recover calls.
- The `check_and_recover_mcap` function currently logs full stdout/stderr for both doctor and recover; consider truncating or summarizing very large outputs to keep logs readable and to avoid log bloat on large or noisy files.

## Individual Comments

### Comment 1
<location> `core/services/recorder_extractor/main.py:111-120` </location>
<code_context>
+    and if it fails, run mcap recover to fix the file.
+    """
+    # Check if mcap binary exists
+    mcap_binary = shutil.which("mcap")
+    if not mcap_binary:
+        logger.warning("mcap binary not found, skipping doctor/recover check")
+        return
+
+    # Ensure path is absolute and exists
+    if not mcap_path.exists():
+        logger.warning(f"MCAP file not found: {mcap_path}")
+        return
+
+    logger.info(f"Running mcap doctor on {mcap_path}")
+    # Run mcap doctor
+    doctor_cmd = ["mcap", "doctor", str(mcap_path)]
+    doctor_proc = await asyncio.create_subprocess_exec(
+        *doctor_cmd,
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Use the resolved `mcap_binary` path for commands instead of hardcoding "mcap".

You already resolve the `mcap` binary with `shutil.which`, but `doctor_cmd` and `recover_cmd` still call the hardcoded `"mcap"`. Please use `mcap_binary` instead (e.g. `doctor_cmd = [mcap_binary, "doctor", str(mcap_path)]`) so the invoked binary always matches what was found on PATH.

Suggested implementation:

```python
    logger.info(f"Running mcap doctor on {mcap_path}")
    # Run mcap doctor
    doctor_cmd = [mcap_binary, "doctor", str(mcap_path)]

```

Search in `check_and_recover_mcap` for where `recover_cmd` is defined (likely later in the function) and update it similarly:

- Change `recover_cmd = ["mcap", "recover", ...]` to `recover_cmd = [mcap_binary, "recover", ...]`.

This ensures both doctor and recover invocations use the same resolved `mcap_binary` path from `shutil.which("mcap")`.
</issue_to_address>

### Comment 2
<location> `core/services/recorder_extractor/main.py:141` </location>
<code_context>
+    logger.error(f"mcap doctor failed for {mcap_path} (code={doctor_proc.returncode}): {stderr}")
+    logger.info(f"Running mcap recover to fix {mcap_path}")
+    # Run mcap recover to replace the file
+    recover_cmd = ["mcap", "recover", str(mcap_path), "-o", str(mcap_path)]
+    recover_proc = await asyncio.create_subprocess_exec(
+        *recover_cmd,
</code_context>

<issue_to_address>
**issue (bug_risk):** Writing recover output directly over the input file risks data loss on partial failures.

Because input and `-o` output paths are identical, a partial write or mid-run failure from `mcap recover` could corrupt the file with no untouched copy left. Consider writing to a temp file in the same directory and only replacing the original on success (e.g., via `Path.replace`) so the original remains intact if recovery fails.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Signed-off-by: Patrick José Pereira <patrickelectric@gmail.com>
@joaomariolago joaomariolago merged commit 69cd2fe into bluerobotics:master Dec 9, 2025
7 checks passed
@patrickelectric patrickelectric added the docs-needed Change needs to be documented label Dec 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-needed Change needs to be documented

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants