-
-
Notifications
You must be signed in to change notification settings - Fork 216
feat(cmd): add cleanup command #1489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughIntroduces a new Cleanup subcommand that removes old DAG run history. The feature adds CLI command implementation, three new flags, updates the RemoveOldDAGRuns API to accept options and return removed run IDs, and provides dry-run capability via options. Changes
Sequence DiagramsequenceDiagram
participant User
participant CLI as Cleanup Command
participant Store as DAGRunStore
participant DataRoot as DataRoot
participant FS as File System
User->>CLI: cleanup --retention-days 30 my-dag
activate CLI
CLI->>CLI: Validate inputs
CLI->>CLI: Build RemoveOldDAGRunsOptions
alt Dry Run
CLI->>Store: RemoveOldDAGRuns(ctx, dag, 30, WithDryRun())
activate Store
Store->>Store: Parse options
Store->>DataRoot: RemoveOld(ctx, 30, dryRun=true)
activate DataRoot
DataRoot->>FS: List old runs (no deletion)
FS-->>DataRoot: Run IDs
DataRoot-->>Store: return removedIDs, nil
deactivate DataRoot
Store-->>CLI: return removedIDs, nil
deactivate Store
CLI->>User: Display preview of runs to be removed
else Delete with Confirmation
CLI->>User: Prompt confirmation (unless --yes)
User-->>CLI: Confirm
CLI->>Store: RemoveOldDAGRuns(ctx, dag, 30)
activate Store
Store->>DataRoot: RemoveOld(ctx, 30, dryRun=false)
activate DataRoot
DataRoot->>FS: Delete old run files
FS-->>DataRoot: Removed run IDs
DataRoot-->>Store: return removedIDs, nil
deactivate DataRoot
Store-->>CLI: return removedIDs, nil
deactivate Store
CLI->>User: Display deleted run count
end
deactivate CLI
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
internal/cmd/cleanup.go (1)
100-117: Consider adding TTY detection for non-interactive environments.When stdin is not a TTY (e.g., in CI pipelines without the
-yflag),ReadString('\n')may block indefinitely or fail on EOF. While the--yesflag provides a workaround, users might not realize they need it.Consider detecting non-interactive stdin and either auto-declining or requiring
-y:+ // Check if stdin is a terminal for interactive confirmation + stdinInfo, _ := os.Stdin.Stat() + isInteractive := (stdinInfo.Mode() & os.ModeCharDevice) != 0 + // Confirmation prompt (unless --yes or --quiet) - if !skipConfirm && !ctx.Quiet { + if !skipConfirm && !ctx.Quiet && isInteractive { fmt.Printf("This will delete %s.\n", actionDesc) + } else if !skipConfirm && !ctx.Quiet && !isInteractive { + return fmt.Errorf("confirmation required: use --yes flag in non-interactive mode") + }internal/persistence/filedagrun/dataroot_test.go (1)
279-281: Consider adding a test case for dry-run mode.All tests pass
dryRun: false. Adding a test that verifies dry-run behavior (returns IDs without actual deletion) would improve coverage of the new functionality.Example test case:
t.Run("DryRunDoesNotDelete", func(t *testing.T) { root := setupTestDataRoot(t) ts := time.Date(2021, 1, 1, 0, 0, 0, 0, time.UTC) dagRun := root.CreateTestDAGRun(t, "dag-run-1", execution.NewUTC(ts)) // ... create attempt with status ... // Dry run should return IDs but not delete removedIDs, err := root.RemoveOld(root.Context, 0, true) require.NoError(t, err) assert.Len(t, removedIDs, 1) assert.True(t, fileutil.FileExists(dagRun.baseDir), "dag-run should still exist after dry-run") })internal/cmd/cleanup_test.go (2)
102-138: Consider increasing the sleep duration for CI stability.The 200ms sleep may be insufficient on slower CI runners. While this pattern is acceptable for integration tests, you might want to use a polling loop with timeout instead of a fixed sleep to avoid flaky tests.
- // Wait for DAG to start running - time.Sleep(time.Millisecond * 200) - dag.AssertLatestStatus(t, core.Running) + // Wait for DAG to start running (poll with timeout) + require.Eventually(t, func() bool { + status, _ := dag.LatestStatus() + return status == core.Running + }, 5*time.Second, 50*time.Millisecond, "DAG should start running")
263-280: Remove unused parameter from helper function.The third parameter
_ stringis unused and can be removed for cleaner code.-func setOldModTime(t *testing.T, baseDir, dagName, _ string, modTime time.Time) { +func setOldModTime(t *testing.T, baseDir, dagName string, modTime time.Time) {Also update the call site at line 242:
- setOldModTime(t, th.Config.Paths.DAGRunsDir, dagName, "", oldTime) + setOldModTime(t, th.Config.Paths.DAGRunsDir, dagName, oldTime)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
cmd/main.go(1 hunks)internal/cmd/cleanup.go(1 hunks)internal/cmd/cleanup_test.go(1 hunks)internal/cmd/flags.go(1 hunks)internal/common/telemetry/collector_test.go(1 hunks)internal/core/execution/dagrun.go(2 hunks)internal/core/execution/dagrun_test.go(3 hunks)internal/persistence/filedagrun/dataroot.go(2 hunks)internal/persistence/filedagrun/dataroot_test.go(3 hunks)internal/persistence/filedagrun/store.go(1 hunks)internal/persistence/filedagrun/store_test.go(1 hunks)internal/runtime/agent/agent.go(1 hunks)internal/runtime/agent/dbclient_test.go(1 hunks)internal/service/scheduler/zombie_detector_test.go(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
**/*.go: Backend entrypoint incmd/orchestrates the scheduler and CLI; runtime, persistence, and service layers sit underinternal/*(for exampleinternal/runtime,internal/persistence)
Keep Go filesgofmt/goimportsclean; use tabs, PascalCase for exported symbols (SchedulerClient), lowerCamelCase for locals, andErr...names for package-level errors
Repository linting relies ongolangci-lint; prefer idiomatic Go patterns, minimal global state, and structured logging helpers ininternal/common
Files:
internal/cmd/flags.gointernal/runtime/agent/agent.gointernal/persistence/filedagrun/dataroot.gointernal/cmd/cleanup.gointernal/runtime/agent/dbclient_test.gointernal/common/telemetry/collector_test.gointernal/persistence/filedagrun/store_test.gointernal/service/scheduler/zombie_detector_test.gointernal/persistence/filedagrun/dataroot_test.gointernal/core/execution/dagrun_test.gocmd/main.gointernal/cmd/cleanup_test.gointernal/persistence/filedagrun/store.gointernal/core/execution/dagrun.go
**/*_test.go
📄 CodeRabbit inference engine (AGENTS.md)
**/*_test.go: Co-locate Go tests as*_test.go; favour table-driven cases and cover failure paths
Usestretchr/testify/requireand shared fixtures frominternal/testinstead of duplicating mocks
Files:
internal/runtime/agent/dbclient_test.gointernal/common/telemetry/collector_test.gointernal/persistence/filedagrun/store_test.gointernal/service/scheduler/zombie_detector_test.gointernal/persistence/filedagrun/dataroot_test.gointernal/core/execution/dagrun_test.gointernal/cmd/cleanup_test.go
🧠 Learnings (3)
📚 Learning: 2025-12-04T10:34:17.062Z
Learnt from: CR
Repo: dagu-org/dagu PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-04T10:34:17.062Z
Learning: Applies to **/*_test.go : Use `stretchr/testify/require` and shared fixtures from `internal/test` instead of duplicating mocks
Applied to files:
internal/common/telemetry/collector_test.go
📚 Learning: 2025-12-04T10:34:17.062Z
Learnt from: CR
Repo: dagu-org/dagu PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-04T10:34:17.062Z
Learning: `make test` (or `make test-coverage`) executes the Go suite via `gotestsum`; append `TEST_TARGET=./internal/...` to focus packages
Applied to files:
internal/cmd/cleanup_test.go
📚 Learning: 2025-12-04T10:34:17.062Z
Learnt from: CR
Repo: dagu-org/dagu PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-04T10:34:17.062Z
Learning: Applies to **/*_test.go : Co-locate Go tests as `*_test.go`; favour table-driven cases and cover failure paths
Applied to files:
internal/cmd/cleanup_test.go
🧬 Code graph analysis (8)
internal/persistence/filedagrun/dataroot.go (3)
internal/core/execution/context.go (1)
Context(16-27)internal/common/logger/tag/tag.go (2)
Error(20-22)Dir(77-79)internal/common/logger/context.go (1)
Error(50-52)
internal/cmd/cleanup.go (3)
internal/common/logger/tag/tag.go (1)
Args(408-410)internal/common/logger/context.go (1)
Errorf(75-77)internal/core/execution/dagrun.go (3)
RemoveOldDAGRunsOption(113-113)WithDryRun(116-120)DAGRunStore(23-48)
internal/runtime/agent/dbclient_test.go (4)
internal/core/execution/context.go (1)
Context(16-27)internal/core/execution/dagrun.go (1)
RemoveOldDAGRunsOption(113-113)api/v2/api.gen.go (1)
Error(465-474)api/v1/api.gen.go (1)
Error(343-352)
internal/common/telemetry/collector_test.go (1)
internal/core/execution/dagrun.go (1)
RemoveOldDAGRunsOption(113-113)
internal/service/scheduler/zombie_detector_test.go (1)
internal/core/execution/dagrun.go (1)
RemoveOldDAGRunsOption(113-113)
internal/core/execution/dagrun_test.go (1)
internal/core/execution/dagrun.go (1)
RemoveOldDAGRunsOption(113-113)
cmd/main.go (1)
internal/cmd/cleanup.go (1)
Cleanup(15-38)
internal/persistence/filedagrun/store.go (1)
internal/core/execution/dagrun.go (2)
RemoveOldDAGRunsOption(113-113)RemoveOldDAGRunsOptions(107-110)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Test on ubuntu-latest
🔇 Additional comments (25)
cmd/main.go (1)
48-48: LGTM!The
Cleanupcommand registration follows the established pattern for other commands in this file.internal/cmd/flags.go (1)
269-292: LGTM!The new flag declarations follow the established patterns in this file. The flags are appropriately configured:
retentionDaysFlagas string with "0" default for later integer conversiondryRunFlagandyesFlagcorrectly marked as boolean flagsyesFlagincludes the conventional-yshorthand for skip-confirmationinternal/runtime/agent/agent.go (1)
976-980: LGTM!The call correctly adapts to the updated
RemoveOldDAGRunsAPI signature. Discarding the returned run IDs with_is appropriate here since this is background retention cleanup during DAG run setup, where only error handling matters.internal/runtime/agent/dbclient_test.go (1)
300-306: LGTM!The mock correctly implements the updated
RemoveOldDAGRunsinterface with the new return type and variadic options parameter. The nil-check pattern for the first return value is consistent with other mock methods in this file.internal/cmd/cleanup.go (3)
14-44: LGTM!The
Cleanupcommand definition is well-structured with clear usage documentation, comprehensive examples covering common use cases, and proper argument validation.
46-62: LGTM!Input validation is thorough with clear error messages. The explicit check for negative values (line 60-62) provides a better user experience than a silent no-op.
119-131: LGTM!The cleanup execution properly distinguishes between dry-run and actual deletion modes, with appropriate user feedback that respects the quiet flag.
internal/persistence/filedagrun/dataroot_test.go (2)
185-187: LGTM!Test correctly updated to use new
RemoveOldsignature and verify the returned removed IDs count.
232-234: LGTM!Test correctly verifies that only one run (the old one) is removed while the recent one is retained.
internal/persistence/filedagrun/dataroot.go (1)
279-333: LGTM!The
RemoveOldimplementation correctly handles the new dry-run capability:
- Collects eligible run IDs before potential deletion
- Skips active runs in both modes
- In dry-run mode, returns what would be deleted without modifying the filesystem
- Empty directory cleanup is appropriately skipped during dry-run
The logic flow is clean and the behavior matches the documented contract.
internal/persistence/filedagrun/store_test.go (1)
126-128: LGTM! Test correctly updated for new API signature.The test now properly captures the returned
removedIDsand asserts that exactly 2 non-active runs are removed (Failed and Succeeded), while the Running run is preserved.internal/common/telemetry/collector_test.go (1)
154-160: LGTM! Mock correctly updated for new interface signature.The mock properly handles the new variadic options parameter and the
([]string, error)return type, with appropriate nil-checking consistent with other mock methods in this file.internal/service/scheduler/zombie_detector_test.go (1)
399-405: LGTM! Mock implementation aligned with updated API.The mock correctly handles the new variadic
optsparameter and([]string, error)return type, consistent with other test files.internal/core/execution/dagrun_test.go (3)
70-76: LGTM! Mock correctly implements new interface signature.The mock properly handles the variadic options parameter and the new
([]string, error)return type with appropriate nil-checking.
248-252: LGTM! Test correctly validates new return type.The test properly verifies that
RemoveOldDAGRunsreturns the expected list of removed IDs.
385-408: LGTM! Good edge case coverage.The tests cover important edge cases: negative retention days (no deletions), zero retention (delete all eligible), and positive retention days. Using
[]string(nil)for the negative case correctly tests the nil-return scenario.internal/cmd/cleanup_test.go (6)
17-44: LGTM! Well-structured test for basic cleanup functionality.The test properly verifies that cleanup with default retention (0) removes all completed history. Good use of the test helpers and assertion methods.
46-72: LGTM! Correctly tests retention-days preservation.The test verifies that recent runs (less than 30 days old) are preserved when using
--retention-days.
74-100: LGTM! Dry-run test correctly verifies no deletion occurs.Good coverage of the
--dry-runflag behavior.
140-170: LGTM! Good validation error testing.Tests properly verify that invalid inputs (negative and non-numeric retention days) are rejected with appropriate error messages.
172-191: LGTM! Edge case tests are well covered.Tests correctly verify argument validation and graceful handling of non-existent DAGs.
193-261: LGTM! Comprehensive direct store test.This test provides good low-level coverage of the
RemoveOldDAGRunsbehavior, verifying that old runs are removed while recent ones are preserved. The test correctly validates both the returnedremovedIDsand the remaining store state.internal/persistence/filedagrun/store.go (1)
440-456: LGTM! Clean implementation of functional options pattern.The changes correctly implement the new API:
- Functional options are properly accumulated into the options struct
- The dry-run flag is correctly propagated to the underlying
root.RemoveOldcall- Return value semantics are consistent with the updated interface
internal/core/execution/dagrun.go (2)
36-41: Interface update looks good.The updated
RemoveOldDAGRunssignature correctly returns the list of removed dag-run IDs and accepts variadic options for extensibility. The documentation clearly explains the dry-run behavior.
106-120: Well-structured functional options implementation.The functional options pattern is correctly implemented:
RemoveOldDAGRunsOptionsstruct is properly documentedRemoveOldDAGRunsOptiontype follows Go conventionsWithDryRun()provides a clean API for enabling dry-run modeThis pattern allows for future extensibility (e.g., adding filters, limits) without breaking the API.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1489 +/- ##
==========================================
+ Coverage 59.90% 59.91% +0.01%
==========================================
Files 195 196 +1
Lines 21918 22001 +83
==========================================
+ Hits 13130 13182 +52
- Misses 7382 7404 +22
- Partials 1406 1415 +9
... and 1 file with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
|
We can include an action to purge dag runs |
|
@ghansham Sorry, I'm not certain if I understand. What do you mean an 'action' ? |
|
I mean on the dag runs page and dagvrun details page, we can have a button to purge the run that invokes cleanup at the backend. In the cleanup command, we can include optional keywords to remove latest run or a specific run by mentioning runid. |
Summary by CodeRabbit
Release Notes
cleanupcommand to remove old DAG run history with configurable retention periods--dry-runflag to preview which runs would be removed without deleting--yesflag to skip confirmation prompts✏️ Tip: You can customize this high-level summary in your review settings.