Skip to content

Conversation

@yottahmd
Copy link
Collaborator

@yottahmd yottahmd commented Dec 16, 2025

Summary by CodeRabbit

Release Notes

  • New Features
    • Added cleanup command to remove old DAG run history with configurable retention periods
    • Supports --dry-run flag to preview which runs would be removed without deleting
    • Includes --yes flag to skip confirmation prompts
    • Automatically preserves active/running DAG runs during cleanup operations

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 16, 2025

Walkthrough

Introduces a new Cleanup subcommand that removes old DAG run history. The feature adds CLI command implementation, three new flags, updates the RemoveOldDAGRuns API to accept options and return removed run IDs, and provides dry-run capability via options.

Changes

Cohort / File(s) Summary
CLI Command Integration
cmd/main.go, internal/cmd/cleanup.go, internal/cmd/cleanup_test.go, internal/cmd/flags.go
Registers Cleanup subcommand and implements cleanup logic with retention-days, dry-run, and yes flags. Supports input validation, dry-run preview, user confirmation flow, and feedback messages. Tests cover retention policies, dry-run, active DAG preservation, and error cases.
Core API & Options
internal/core/execution/dagrun.go, internal/core/execution/dagrun_test.go
Updates DAGRunStore.RemoveOldDAGRuns signature to accept variadic RemoveOldDAGRunsOption and return ([]string, error). Introduces RemoveOldDAGRunsOptions struct, RemoveOldDAGRunsOption functional option type, and WithDryRun() helper. Tests updated to handle new return values and options.
File Persistence Implementation
internal/persistence/filedagrun/dataroot.go, internal/persistence/filedagrun/dataroot_test.go, internal/persistence/filedagrun/store.go, internal/persistence/filedagrun/store_test.go
RemoveOld now accepts dryRun boolean and returns ([]string, error). Store.RemoveOldDAGRuns processes options and delegates to DataRoot with dry-run flag. Accumulates and returns removed run IDs; conditional deletion based on dry-run mode. Tests verify returned ID counts.
Mock Updates
internal/common/telemetry/collector_test.go, internal/runtime/agent/dbclient_test.go, internal/service/scheduler/zombie_detector_test.go
Updates mockDAGRunStore.RemoveOldDAGRuns to accept variadic options and return ([]string, error), handling nil first return appropriately.
Agent Integration
internal/runtime/agent/agent.go
Discards removed run IDs return value; continues error-only handling in setupDAGRunAttempt.

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI as Cleanup Command
    participant Store as DAGRunStore
    participant DataRoot as DataRoot
    participant FS as File System

    User->>CLI: cleanup --retention-days 30 my-dag
    activate CLI
    CLI->>CLI: Validate inputs
    CLI->>CLI: Build RemoveOldDAGRunsOptions
    
    alt Dry Run
        CLI->>Store: RemoveOldDAGRuns(ctx, dag, 30, WithDryRun())
        activate Store
        Store->>Store: Parse options
        Store->>DataRoot: RemoveOld(ctx, 30, dryRun=true)
        activate DataRoot
        DataRoot->>FS: List old runs (no deletion)
        FS-->>DataRoot: Run IDs
        DataRoot-->>Store: return removedIDs, nil
        deactivate DataRoot
        Store-->>CLI: return removedIDs, nil
        deactivate Store
        CLI->>User: Display preview of runs to be removed
    else Delete with Confirmation
        CLI->>User: Prompt confirmation (unless --yes)
        User-->>CLI: Confirm
        CLI->>Store: RemoveOldDAGRuns(ctx, dag, 30)
        activate Store
        Store->>DataRoot: RemoveOld(ctx, 30, dryRun=false)
        activate DataRoot
        DataRoot->>FS: Delete old run files
        FS-->>DataRoot: Removed run IDs
        DataRoot-->>Store: return removedIDs, nil
        deactivate DataRoot
        Store-->>CLI: return removedIDs, nil
        deactivate Store
        CLI->>User: Display deleted run count
    end
    deactivate CLI
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Attention areas:
    • Verify options pattern implementation consistency across DAGRunStore, Store, and DataRoot layers
    • Confirm dry-run logic correctly skips deletions while accumulating IDs in all code paths
    • Validate that all mock implementations properly handle the new variadic options parameter and nil/non-nil return value branches
    • Review cleanup command's input validation, confirmation flow, and quiet mode handling
    • Ensure removal of active runs is properly preserved throughout the deletion logic

Poem

🐰 Old runs, they pile up high—
A cleanup command says goodbye!
Dry-run mode to peek and see,
Which histories removed shall be.
Retention policies, options sweet,
Make DAG housekeeping complete! 🧹✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a cleanup command to the CLI as a new top-level feature.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch clean-command

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
internal/cmd/cleanup.go (1)

100-117: Consider adding TTY detection for non-interactive environments.

When stdin is not a TTY (e.g., in CI pipelines without the -y flag), ReadString('\n') may block indefinitely or fail on EOF. While the --yes flag provides a workaround, users might not realize they need it.

Consider detecting non-interactive stdin and either auto-declining or requiring -y:

+	// Check if stdin is a terminal for interactive confirmation
+	stdinInfo, _ := os.Stdin.Stat()
+	isInteractive := (stdinInfo.Mode() & os.ModeCharDevice) != 0
+
 	// Confirmation prompt (unless --yes or --quiet)
-	if !skipConfirm && !ctx.Quiet {
+	if !skipConfirm && !ctx.Quiet && isInteractive {
 		fmt.Printf("This will delete %s.\n", actionDesc)
+	} else if !skipConfirm && !ctx.Quiet && !isInteractive {
+		return fmt.Errorf("confirmation required: use --yes flag in non-interactive mode")
+	}
internal/persistence/filedagrun/dataroot_test.go (1)

279-281: Consider adding a test case for dry-run mode.

All tests pass dryRun: false. Adding a test that verifies dry-run behavior (returns IDs without actual deletion) would improve coverage of the new functionality.

Example test case:

t.Run("DryRunDoesNotDelete", func(t *testing.T) {
    root := setupTestDataRoot(t)
    ts := time.Date(2021, 1, 1, 0, 0, 0, 0, time.UTC)
    dagRun := root.CreateTestDAGRun(t, "dag-run-1", execution.NewUTC(ts))
    // ... create attempt with status ...
    
    // Dry run should return IDs but not delete
    removedIDs, err := root.RemoveOld(root.Context, 0, true)
    require.NoError(t, err)
    assert.Len(t, removedIDs, 1)
    assert.True(t, fileutil.FileExists(dagRun.baseDir), "dag-run should still exist after dry-run")
})
internal/cmd/cleanup_test.go (2)

102-138: Consider increasing the sleep duration for CI stability.

The 200ms sleep may be insufficient on slower CI runners. While this pattern is acceptable for integration tests, you might want to use a polling loop with timeout instead of a fixed sleep to avoid flaky tests.

-		// Wait for DAG to start running
-		time.Sleep(time.Millisecond * 200)
-		dag.AssertLatestStatus(t, core.Running)
+		// Wait for DAG to start running (poll with timeout)
+		require.Eventually(t, func() bool {
+			status, _ := dag.LatestStatus()
+			return status == core.Running
+		}, 5*time.Second, 50*time.Millisecond, "DAG should start running")

263-280: Remove unused parameter from helper function.

The third parameter _ string is unused and can be removed for cleaner code.

-func setOldModTime(t *testing.T, baseDir, dagName, _ string, modTime time.Time) {
+func setOldModTime(t *testing.T, baseDir, dagName string, modTime time.Time) {

Also update the call site at line 242:

-		setOldModTime(t, th.Config.Paths.DAGRunsDir, dagName, "", oldTime)
+		setOldModTime(t, th.Config.Paths.DAGRunsDir, dagName, oldTime)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5d6e50d and 088a583.

📒 Files selected for processing (14)
  • cmd/main.go (1 hunks)
  • internal/cmd/cleanup.go (1 hunks)
  • internal/cmd/cleanup_test.go (1 hunks)
  • internal/cmd/flags.go (1 hunks)
  • internal/common/telemetry/collector_test.go (1 hunks)
  • internal/core/execution/dagrun.go (2 hunks)
  • internal/core/execution/dagrun_test.go (3 hunks)
  • internal/persistence/filedagrun/dataroot.go (2 hunks)
  • internal/persistence/filedagrun/dataroot_test.go (3 hunks)
  • internal/persistence/filedagrun/store.go (1 hunks)
  • internal/persistence/filedagrun/store_test.go (1 hunks)
  • internal/runtime/agent/agent.go (1 hunks)
  • internal/runtime/agent/dbclient_test.go (1 hunks)
  • internal/service/scheduler/zombie_detector_test.go (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*.go: Backend entrypoint in cmd/ orchestrates the scheduler and CLI; runtime, persistence, and service layers sit under internal/* (for example internal/runtime, internal/persistence)
Keep Go files gofmt/goimports clean; use tabs, PascalCase for exported symbols (SchedulerClient), lowerCamelCase for locals, and Err... names for package-level errors
Repository linting relies on golangci-lint; prefer idiomatic Go patterns, minimal global state, and structured logging helpers in internal/common

Files:

  • internal/cmd/flags.go
  • internal/runtime/agent/agent.go
  • internal/persistence/filedagrun/dataroot.go
  • internal/cmd/cleanup.go
  • internal/runtime/agent/dbclient_test.go
  • internal/common/telemetry/collector_test.go
  • internal/persistence/filedagrun/store_test.go
  • internal/service/scheduler/zombie_detector_test.go
  • internal/persistence/filedagrun/dataroot_test.go
  • internal/core/execution/dagrun_test.go
  • cmd/main.go
  • internal/cmd/cleanup_test.go
  • internal/persistence/filedagrun/store.go
  • internal/core/execution/dagrun.go
**/*_test.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*_test.go: Co-locate Go tests as *_test.go; favour table-driven cases and cover failure paths
Use stretchr/testify/require and shared fixtures from internal/test instead of duplicating mocks

Files:

  • internal/runtime/agent/dbclient_test.go
  • internal/common/telemetry/collector_test.go
  • internal/persistence/filedagrun/store_test.go
  • internal/service/scheduler/zombie_detector_test.go
  • internal/persistence/filedagrun/dataroot_test.go
  • internal/core/execution/dagrun_test.go
  • internal/cmd/cleanup_test.go
🧠 Learnings (3)
📚 Learning: 2025-12-04T10:34:17.062Z
Learnt from: CR
Repo: dagu-org/dagu PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-04T10:34:17.062Z
Learning: Applies to **/*_test.go : Use `stretchr/testify/require` and shared fixtures from `internal/test` instead of duplicating mocks

Applied to files:

  • internal/common/telemetry/collector_test.go
📚 Learning: 2025-12-04T10:34:17.062Z
Learnt from: CR
Repo: dagu-org/dagu PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-04T10:34:17.062Z
Learning: `make test` (or `make test-coverage`) executes the Go suite via `gotestsum`; append `TEST_TARGET=./internal/...` to focus packages

Applied to files:

  • internal/cmd/cleanup_test.go
📚 Learning: 2025-12-04T10:34:17.062Z
Learnt from: CR
Repo: dagu-org/dagu PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-04T10:34:17.062Z
Learning: Applies to **/*_test.go : Co-locate Go tests as `*_test.go`; favour table-driven cases and cover failure paths

Applied to files:

  • internal/cmd/cleanup_test.go
🧬 Code graph analysis (8)
internal/persistence/filedagrun/dataroot.go (3)
internal/core/execution/context.go (1)
  • Context (16-27)
internal/common/logger/tag/tag.go (2)
  • Error (20-22)
  • Dir (77-79)
internal/common/logger/context.go (1)
  • Error (50-52)
internal/cmd/cleanup.go (3)
internal/common/logger/tag/tag.go (1)
  • Args (408-410)
internal/common/logger/context.go (1)
  • Errorf (75-77)
internal/core/execution/dagrun.go (3)
  • RemoveOldDAGRunsOption (113-113)
  • WithDryRun (116-120)
  • DAGRunStore (23-48)
internal/runtime/agent/dbclient_test.go (4)
internal/core/execution/context.go (1)
  • Context (16-27)
internal/core/execution/dagrun.go (1)
  • RemoveOldDAGRunsOption (113-113)
api/v2/api.gen.go (1)
  • Error (465-474)
api/v1/api.gen.go (1)
  • Error (343-352)
internal/common/telemetry/collector_test.go (1)
internal/core/execution/dagrun.go (1)
  • RemoveOldDAGRunsOption (113-113)
internal/service/scheduler/zombie_detector_test.go (1)
internal/core/execution/dagrun.go (1)
  • RemoveOldDAGRunsOption (113-113)
internal/core/execution/dagrun_test.go (1)
internal/core/execution/dagrun.go (1)
  • RemoveOldDAGRunsOption (113-113)
cmd/main.go (1)
internal/cmd/cleanup.go (1)
  • Cleanup (15-38)
internal/persistence/filedagrun/store.go (1)
internal/core/execution/dagrun.go (2)
  • RemoveOldDAGRunsOption (113-113)
  • RemoveOldDAGRunsOptions (107-110)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test on ubuntu-latest
🔇 Additional comments (25)
cmd/main.go (1)

48-48: LGTM!

The Cleanup command registration follows the established pattern for other commands in this file.

internal/cmd/flags.go (1)

269-292: LGTM!

The new flag declarations follow the established patterns in this file. The flags are appropriately configured:

  • retentionDaysFlag as string with "0" default for later integer conversion
  • dryRunFlag and yesFlag correctly marked as boolean flags
  • yesFlag includes the conventional -y shorthand for skip-confirmation
internal/runtime/agent/agent.go (1)

976-980: LGTM!

The call correctly adapts to the updated RemoveOldDAGRuns API signature. Discarding the returned run IDs with _ is appropriate here since this is background retention cleanup during DAG run setup, where only error handling matters.

internal/runtime/agent/dbclient_test.go (1)

300-306: LGTM!

The mock correctly implements the updated RemoveOldDAGRuns interface with the new return type and variadic options parameter. The nil-check pattern for the first return value is consistent with other mock methods in this file.

internal/cmd/cleanup.go (3)

14-44: LGTM!

The Cleanup command definition is well-structured with clear usage documentation, comprehensive examples covering common use cases, and proper argument validation.


46-62: LGTM!

Input validation is thorough with clear error messages. The explicit check for negative values (line 60-62) provides a better user experience than a silent no-op.


119-131: LGTM!

The cleanup execution properly distinguishes between dry-run and actual deletion modes, with appropriate user feedback that respects the quiet flag.

internal/persistence/filedagrun/dataroot_test.go (2)

185-187: LGTM!

Test correctly updated to use new RemoveOld signature and verify the returned removed IDs count.


232-234: LGTM!

Test correctly verifies that only one run (the old one) is removed while the recent one is retained.

internal/persistence/filedagrun/dataroot.go (1)

279-333: LGTM!

The RemoveOld implementation correctly handles the new dry-run capability:

  • Collects eligible run IDs before potential deletion
  • Skips active runs in both modes
  • In dry-run mode, returns what would be deleted without modifying the filesystem
  • Empty directory cleanup is appropriately skipped during dry-run

The logic flow is clean and the behavior matches the documented contract.

internal/persistence/filedagrun/store_test.go (1)

126-128: LGTM! Test correctly updated for new API signature.

The test now properly captures the returned removedIDs and asserts that exactly 2 non-active runs are removed (Failed and Succeeded), while the Running run is preserved.

internal/common/telemetry/collector_test.go (1)

154-160: LGTM! Mock correctly updated for new interface signature.

The mock properly handles the new variadic options parameter and the ([]string, error) return type, with appropriate nil-checking consistent with other mock methods in this file.

internal/service/scheduler/zombie_detector_test.go (1)

399-405: LGTM! Mock implementation aligned with updated API.

The mock correctly handles the new variadic opts parameter and ([]string, error) return type, consistent with other test files.

internal/core/execution/dagrun_test.go (3)

70-76: LGTM! Mock correctly implements new interface signature.

The mock properly handles the variadic options parameter and the new ([]string, error) return type with appropriate nil-checking.


248-252: LGTM! Test correctly validates new return type.

The test properly verifies that RemoveOldDAGRuns returns the expected list of removed IDs.


385-408: LGTM! Good edge case coverage.

The tests cover important edge cases: negative retention days (no deletions), zero retention (delete all eligible), and positive retention days. Using []string(nil) for the negative case correctly tests the nil-return scenario.

internal/cmd/cleanup_test.go (6)

17-44: LGTM! Well-structured test for basic cleanup functionality.

The test properly verifies that cleanup with default retention (0) removes all completed history. Good use of the test helpers and assertion methods.


46-72: LGTM! Correctly tests retention-days preservation.

The test verifies that recent runs (less than 30 days old) are preserved when using --retention-days.


74-100: LGTM! Dry-run test correctly verifies no deletion occurs.

Good coverage of the --dry-run flag behavior.


140-170: LGTM! Good validation error testing.

Tests properly verify that invalid inputs (negative and non-numeric retention days) are rejected with appropriate error messages.


172-191: LGTM! Edge case tests are well covered.

Tests correctly verify argument validation and graceful handling of non-existent DAGs.


193-261: LGTM! Comprehensive direct store test.

This test provides good low-level coverage of the RemoveOldDAGRuns behavior, verifying that old runs are removed while recent ones are preserved. The test correctly validates both the returned removedIDs and the remaining store state.

internal/persistence/filedagrun/store.go (1)

440-456: LGTM! Clean implementation of functional options pattern.

The changes correctly implement the new API:

  • Functional options are properly accumulated into the options struct
  • The dry-run flag is correctly propagated to the underlying root.RemoveOld call
  • Return value semantics are consistent with the updated interface
internal/core/execution/dagrun.go (2)

36-41: Interface update looks good.

The updated RemoveOldDAGRuns signature correctly returns the list of removed dag-run IDs and accepts variadic options for extensibility. The documentation clearly explains the dry-run behavior.


106-120: Well-structured functional options implementation.

The functional options pattern is correctly implemented:

  • RemoveOldDAGRunsOptions struct is properly documented
  • RemoveOldDAGRunsOption type follows Go conventions
  • WithDryRun() provides a clean API for enabling dry-run mode

This pattern allows for future extensibility (e.g., adding filters, limits) without breaking the API.

@yottahmd yottahmd merged commit c726231 into main Dec 16, 2025
5 checks passed
@yottahmd yottahmd deleted the clean-command branch December 16, 2025 14:36
@codecov
Copy link

codecov bot commented Dec 16, 2025

Codecov Report

❌ Patch coverage is 66.29213% with 30 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.91%. Comparing base (5d6e50d) to head (088a583).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
internal/cmd/cleanup.go 70.83% 15 Missing and 6 partials ⚠️
internal/core/execution/dagrun.go 0.00% 3 Missing ⚠️
internal/persistence/filedagrun/store.go 50.00% 2 Missing and 1 partial ⚠️
internal/persistence/filedagrun/dataroot.go 66.66% 1 Missing and 1 partial ⚠️
internal/runtime/agent/agent.go 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1489      +/-   ##
==========================================
+ Coverage   59.90%   59.91%   +0.01%     
==========================================
  Files         195      196       +1     
  Lines       21918    22001      +83     
==========================================
+ Hits        13130    13182      +52     
- Misses       7382     7404      +22     
- Partials     1406     1415       +9     
Files with missing lines Coverage Δ
cmd/main.go 86.95% <100.00%> (+0.59%) ⬆️
internal/cmd/flags.go 100.00% <ø> (ø)
internal/runtime/agent/agent.go 49.90% <0.00%> (ø)
internal/persistence/filedagrun/dataroot.go 65.01% <66.66%> (-0.21%) ⬇️
internal/core/execution/dagrun.go 58.82% <0.00%> (-5.70%) ⬇️
internal/persistence/filedagrun/store.go 61.24% <50.00%> (-0.33%) ⬇️
internal/cmd/cleanup.go 70.83% <70.83%> (ø)

... and 1 file with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5d6e50d...088a583. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ghansham
Copy link

We can include an action to purge dag runs

@yottahmd
Copy link
Collaborator Author

@ghansham Sorry, I'm not certain if I understand. What do you mean an 'action' ?

@ghansham
Copy link

ghansham commented Dec 16, 2025

I mean on the dag runs page and dagvrun details page, we can have a button to purge the run that invokes cleanup at the backend.

In the cleanup command, we can include optional keywords to remove latest run or a specific run by mentioning runid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants