core: services: recoder_extractor: Add mcap recover #3667

patrickelectric · 2025-12-08T20:11:34Z

Summary by Sourcery

Add an automatic integrity check and recovery step for MCAP recordings before thumbnail extraction.

Enhancements:

Introduce an asynchronous helper to run mcap doctor on MCAP files and invoke mcap recover when corruption is detected.
Integrate the MCAP health check into the recording extraction flow, skipping recovery gracefully when the mcap CLI or target file is unavailable.

sourcery-ai · 2025-12-08T20:11:40Z

Reviewer's Guide

Adds an async MCAP integrity check-and-repair step using the external mcap tool before running video extraction in the recorder_extractor service.

Sequence diagram for MCAP check and recover before video extraction

sequenceDiagram
    participant Extractor as extract_mcap_recordings
    participant Checker as check_and_recover_mcap
    participant OS as asyncio_subprocess
    participant McapBinary as mcap_CLI

    Extractor->>Checker: check_and_recover_mcap(mcap_path)
    Checker->>Checker: shutil.which(mcap)
    alt mcap binary missing
        Checker-->>Extractor: return
    else mcap binary available
        Checker->>Checker: validate mcap_path exists
        alt mcap_path missing
            Checker-->>Extractor: return
        else mcap_path exists
            Checker->>OS: create_subprocess_exec(mcap doctor mcap_path)
            OS-->>Checker: returncode, stdout, stderr
            alt doctor success
                Checker-->>Extractor: return
            else doctor failure
                Checker->>OS: create_subprocess_exec(mcap recover mcap_path -o mcap_path)
                OS-->>Checker: recover_returncode, recover_stdout, recover_stderr
                alt recover success
                    Checker-->>Extractor: return
                else recover failure
                    Checker-->>Extractor: return
                end
            end
        end
    end
    Extractor->>OS: create_subprocess_exec(mcap-foxglove-video-extract mcap_path)
    OS-->>Extractor: extraction result

Class diagram for updated recorder_extractor main module

classDiagram
    class RecorderExtractorMainModule {
        +async build_thumbnail_bytes(path: Path) bytes
        +async extract_mcap_recordings() None
        +async check_and_recover_mcap(mcap_path: Path) None
    }

    class AsyncioSubprocess {
        +create_subprocess_exec(*cmd) Process
    }

    class Process {
        +async communicate() tuple
        +returncode int
    }

    RecorderExtractorMainModule --> AsyncioSubprocess : uses
    AsyncioSubprocess --> Process : creates

File-Level Changes

Change	Details	Files
Introduce an async helper that runs `mcap doctor` and conditionally `mcap recover` on MCAP files.	Add `check_and_recover_mcap` coroutine that verifies `mcap` binary existence and MCAP file presence. Execute `mcap doctor` via `asyncio.create_subprocess_exec` and log success or detailed failure output. On doctor failure, invoke `mcap recover` to repair the file in-place and log success or errors, handling non-zero exit codes gracefully.	`core/services/recorder_extractor/main.py`
Integrate MCAP repair step into the extraction workflow.	Before calling `mcap-foxglove-video-extract`, await the new `check_and_recover_mcap` helper for each MCAP file. Ensure the recover step is only run for files that are not currently in use, preserving existing in-use checks.	`core/services/recorder_extractor/main.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey there - I've reviewed your changes and found some issues that need to be addressed.

When running mcap recover, consider writing to a temporary file and then atomically moving it over the original path to avoid corrupting the only copy if the recover process produces a bad or partial file.
You can simplify the subprocess handling by using text=True (or encoding="utf-8", errors="ignore") in create_subprocess_exec instead of manually decoding stdout/stderr bytes in both the doctor and recover calls.
The check_and_recover_mcap function currently logs full stdout/stderr for both doctor and recover; consider truncating or summarizing very large outputs to keep logs readable and to avoid log bloat on large or noisy files.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- When running `mcap recover`, consider writing to a temporary file and then atomically moving it over the original path to avoid corrupting the only copy if the recover process produces a bad or partial file.
- You can simplify the subprocess handling by using `text=True` (or `encoding="utf-8"`, `errors="ignore"`) in `create_subprocess_exec` instead of manually decoding `stdout`/`stderr` bytes in both the doctor and recover calls.
- The `check_and_recover_mcap` function currently logs full stdout/stderr for both doctor and recover; consider truncating or summarizing very large outputs to keep logs readable and to avoid log bloat on large or noisy files.

## Individual Comments

### Comment 1
<location> `core/services/recorder_extractor/main.py:111-120` </location>
<code_context>
+    and if it fails, run mcap recover to fix the file.
+    """
+    # Check if mcap binary exists
+    mcap_binary = shutil.which("mcap")
+    if not mcap_binary:
+        logger.warning("mcap binary not found, skipping doctor/recover check")
+        return
+
+    # Ensure path is absolute and exists
+    if not mcap_path.exists():
+        logger.warning(f"MCAP file not found: {mcap_path}")
+        return
+
+    logger.info(f"Running mcap doctor on {mcap_path}")
+    # Run mcap doctor
+    doctor_cmd = ["mcap", "doctor", str(mcap_path)]
+    doctor_proc = await asyncio.create_subprocess_exec(
+        *doctor_cmd,
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Use the resolved `mcap_binary` path for commands instead of hardcoding "mcap".

You already resolve the `mcap` binary with `shutil.which`, but `doctor_cmd` and `recover_cmd` still call the hardcoded `"mcap"`. Please use `mcap_binary` instead (e.g. `doctor_cmd = [mcap_binary, "doctor", str(mcap_path)]`) so the invoked binary always matches what was found on PATH.

Suggested implementation:

```python
    logger.info(f"Running mcap doctor on {mcap_path}")
    # Run mcap doctor
    doctor_cmd = [mcap_binary, "doctor", str(mcap_path)]

```

Search in `check_and_recover_mcap` for where `recover_cmd` is defined (likely later in the function) and update it similarly:

- Change `recover_cmd = ["mcap", "recover", ...]` to `recover_cmd = [mcap_binary, "recover", ...]`.

This ensures both doctor and recover invocations use the same resolved `mcap_binary` path from `shutil.which("mcap")`.
</issue_to_address>

### Comment 2
<location> `core/services/recorder_extractor/main.py:141` </location>
<code_context>
+    logger.error(f"mcap doctor failed for {mcap_path} (code={doctor_proc.returncode}): {stderr}")
+    logger.info(f"Running mcap recover to fix {mcap_path}")
+    # Run mcap recover to replace the file
+    recover_cmd = ["mcap", "recover", str(mcap_path), "-o", str(mcap_path)]
+    recover_proc = await asyncio.create_subprocess_exec(
+        *recover_cmd,
</code_context>

<issue_to_address>
**issue (bug_risk):** Writing recover output directly over the input file risks data loss on partial failures.

Because input and `-o` output paths are identical, a partial write or mid-run failure from `mcap recover` could corrupt the file with no untouched copy left. Consider writing to a temp file in the same directory and only replacing the original on success (e.g., via `Path.replace`) so the original remains intact if recovery fails.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

core/services/recorder_extractor/main.py

Signed-off-by: Patrick José Pereira <patrickelectric@gmail.com>

patrickelectric requested review from Williangalvani and joaomariolago December 8, 2025 20:11

sourcery-ai bot reviewed Dec 8, 2025

View reviewed changes

core/services/recorder_extractor/main.py Show resolved Hide resolved

core/services/recorder_extractor/main.py Outdated Show resolved Hide resolved

patrickelectric force-pushed the add-mcap-recover branch from 934ea4e to 31ff8d8 Compare December 9, 2025 13:45

core: services: recoder_extractor: Add mcap recover

0fde2db

Signed-off-by: Patrick José Pereira <patrickelectric@gmail.com>

patrickelectric force-pushed the add-mcap-recover branch from 31ff8d8 to 0fde2db Compare December 9, 2025 13:59

joaomariolago approved these changes Dec 9, 2025

View reviewed changes

joaomariolago merged commit 69cd2fe into bluerobotics:master Dec 9, 2025
7 checks passed

patrickelectric added the docs-needed Change needs to be documented label Dec 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

core: services: recoder_extractor: Add mcap recover #3667

core: services: recoder_extractor: Add mcap recover #3667

Uh oh!

patrickelectric commented Dec 8, 2025 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Dec 8, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

core: services: recoder_extractor: Add mcap recover #3667

core: services: recoder_extractor: Add mcap recover #3667

Uh oh!

Conversation

patrickelectric commented Dec 8, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for MCAP check and recover before video extraction

Class diagram for updated recorder_extractor main module

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

patrickelectric commented Dec 8, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Dec 8, 2025 •

edited

Loading