Skip to content

Conversation

@thefishhat
Copy link
Contributor

@thefishhat thefishhat commented Jun 19, 2025

What does this PR do?

It implements step retry for DAG runs, allowing users to retry execution from any individual step (and its downstreams), not just for the entire DAG failed/canceled nodes.

Why is this needed?

Addresses issue #1015 . Previously, users could only retry entire DAG runs or all failed/canceled steps. There was no way to re-execute a specific step and its downstreams without re-running the whole DAG.

What changed?

  • CLI
    • Added a --step flag to the dagu retry command to specify a single step to retry (and its downstreams).
  • Backend/Manager:
    • Added RetryDAGStep method to the manager, which triggers a retry from a specific step.
    • Refactored retry logic to share code between full and step-specific retries.
    • Updated the backend API (v1 & v2) to accept a step parameter for the retry endpoint and route to the correct logic.
    • Implemented CreateStepRetryGraph in the scheduler to reset only the specified step and its downstreams for re-execution.
  • UI:
    • Added a “Retry from this step” (play icon) button to every step in the Status view (where we show the last DAG run).
    • Clicking it opens a confirmation dialog before triggering the retry.
    • UI calls the backend with the step name to perform the targeted retry.

How was this tested?

  • Manually retried arbitrary step on any previously ran DAG and noticed the timestamp increase
  • Automatic tests for the orchestration layers (Manager, Agent) and Scheduler/Graph logic.

Screenshots

image

@thefishhat thefishhat force-pushed the feat/add-dag-step-rerun branch 2 times, most recently from d3f78a6 to f259cc2 Compare June 19, 2025 01:47
@thefishhat thefishhat force-pushed the feat/add-dag-step-rerun branch from f259cc2 to 8db5423 Compare June 19, 2025 01:51
Copy link
Collaborator

@yottahmd yottahmd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks fantastic! I just left nitpick comments.

@thefishhat thefishhat force-pushed the feat/add-dag-step-rerun branch 3 times, most recently from aef89b7 to 1564e9e Compare June 19, 2025 12:18
@thefishhat thefishhat force-pushed the feat/add-dag-step-rerun branch from 1564e9e to 7cc564c Compare June 19, 2025 12:19
@thefishhat
Copy link
Contributor Author

Overall, looks fantastic! I just left nitpick comments.

Thanks for catching it! 😁

@yottahmd yottahmd merged commit 4ee3c50 into dagu-org:main Jun 19, 2025
5 checks passed
@codecov
Copy link

codecov bot commented Jun 19, 2025

Codecov Report

Attention: Patch coverage is 78.12500% with 14 lines in your changes missing coverage. Please review.

Project coverage is 67.80%. Comparing base (379d679) to head (7cc564c).
Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
internal/digraph/scheduler/graph.go 75.00% 6 Missing and 3 partials ⚠️
internal/agent/agent.go 76.92% 2 Missing and 1 partial ⚠️
internal/cmd/retry.go 85.71% 0 Missing and 1 partial ⚠️
internal/cmd/start.go 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1030      +/-   ##
==========================================
+ Coverage   67.74%   67.80%   +0.05%     
==========================================
  Files          93       93              
  Lines       13887    13946      +59     
==========================================
+ Hits         9408     9456      +48     
- Misses       3677     3685       +8     
- Partials      802      805       +3     
Files with missing lines Coverage Δ
internal/dagrun/manager.go 39.42% <100.00%> (+1.25%) ⬆️
internal/cmd/retry.go 73.75% <85.71%> (+1.02%) ⬆️
internal/cmd/start.go 52.47% <0.00%> (ø)
internal/agent/agent.go 54.78% <76.92%> (+0.55%) ⬆️
internal/digraph/scheduler/graph.go 74.19% <75.00%> (+0.16%) ⬆️

... and 2 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 379d679...7cc564c. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants