A simple command to run workflows (DAGs) defined in YAML format
dagu is a single command that generates and executes a DAG (Directed acyclic graph) from a simple YAML definition. dagu also comes with a convenient web UI & REST API interface. It aims to be one of the easiest option to manage DAGs executed by cron.
- dagu
Currently, my environment has many problems. Hundreds of complex cron jobs are registered on huge servers and it is impossible to keep track of the dependencies between them. If one job fails, I don't know which job to re-run. I also have to SSH into the server to see the logs and manually run the shell scripts one by one.
So I needed a tool that can explicitly visualize and manage the dependencies of the pipeline.
How nice it would be to be able to visually see the job dependencies, execution status, and logs of each job in a web browser, and to be able to rerun or stop a series of jobs with just a mouse click!
I considered many potential tools such as Airflow, Rundeck, Luigi, DigDag, JobScheduler, etc.
But unfortunately, they were not suitable for my existing environment. Because they required a DBMS (Database Management System) installation, relatively high learning curves, and more operational overheads. We only have a small group of engineers in our office and use a less common DBMS.
Finally, I decided to build my own tool that would not require any DBMS server, any daemon process, or any additional operational burden and is easy to use.
Download the binary from Releases page and place it on your system.
dagu start [--params=<params>] <DAG file>- run a DAGdagu status <DAG file>- display the current status of the DAGdagu retry --req=<request-id> <DAG file>- retry the failed/canceled DAGdagu stop <DAG file>- cancel a DAGdagu dry [--params=<params>] <DAG file>- dry-run a DAGdagu server- start a web server for web UI
- Simple command interface (See Usage)
- Simple configuration YAML format (See Simple example)
- Web UI to visualize, manage DAGs and watch logs
- Parameterization
- Conditions
- Automatic retry
- Cancellation
- Retry
- Prallelism limits
- Environment variables
- Repeat
- Basic Authentication
- E-mail notifications
- REST API interface
- onExit / onSuccess / onFailure / onCancel handlers
- Automatic history cleaning
- ETL Pipeline
- Batches
- Machine Learning
- Data Processing
- Automation
-
DAGs: Overview of all DAGs in your environment.
-
Detail: Current status of the dag.
-
Timeline: Timeline of each steps in the pipeline.
-
History: History of the execution of the pipeline.
DAGU__DATA- path to directory for internal use by dagu (default :~/.dagu/data)DAGU__LOGS- path to directory for logging (default :~/.dagu/logs)
Please create ~/.dagu/admin.yaml.
host: <hostname for web UI address> # default value is 127.0.0.1
port: <port number for web UI address> # default value is 8080
dags: <the location of DAG configuration files> # default value is current working directory
command: <Absolute path to the dagu binary> # [optional] required if the dagu command not in $PATH
isBasicAuth: <true|false> # [optional] basic auth config
basicAuthUsername: <username for basic auth of web UI> # [optional] basic auth config
basicAuthPassword: <password for basic auth of web UI> # [optional] basic auth configPlease create ~/.dagu/config.yaml. All settings can be overridden by individual DAG configurations.
Creating a global configuration is a convenient way to organize common settings.
logDir: <path-to-write-log> # log directory to write standard output
histRetentionDays: 3 # history retention days
smtp: # [optional] mail server configurations to send notifications
host: <smtp server host>
port: <stmp server port>
errorMail: # [optional] mail configurations for error-level
from: <from address>
to: <to address>
prefix: <prefix of mail subject>
infoMail:
from: <from address> # [optional] mail configurations for info-level
to: <to address>
prefix: <prefix of mail subject>name: minimal configuration # DAG name
steps: # steps inside the DAG
- name: step 1 # step name (should be unique within the file)
description: step 1 # [optional] description of the step
command: python main_1.py # command and arguments
dir: ${HOME}/dags/ # [optional] working directory
- name: step 2
description: step 2
command: python main_2.py
dir: ${HOME}/dags/
depends:
- step 1 # [optional] dependant stepsname: all configuration # DAG name
description: run a DAG # DAG description
env: # Environment variables
LOG_DIR: ${HOME}/logs
PATH: /usr/local/bin:${PATH}
logDir: ${LOG_DIR} # log directory to write standard output
histRetentionDays: 3 # execution history retention days (not for log files)
delaySec: 1 # interval seconds between steps
maxActiveRuns: 1 # max parallel number of running step
params: param1 param2 # parameters can be refered by $1, $2 and so on.
preconditions: # precondisions for whether the DAG is allowed to run
- condition: "`printf 1`" # command or variables to evaluate
expected: "1" # value to be expected to run the DAG
mailOn:
failure: true # send a mail when the DAG failed
success: true # send a mail when the DAG finished
handlerOn: # Handler on Success, Failure, Cancel, Exit
success: # will be executed when the DAG succeed
command: "echo succeed"
failure: # will be executed when the DAG failed
command: "echo failed"
cancel: # will be executed when the DAG canceled
command: "echo canceled"
exit: # will be executed when the DAG exited
command: "echo finished"
steps:
- name: step 1 # DAG name
description: step 1 # DAG description
dir: ${HOME}/logs # working directory
command: python main.py $1 # command and parameters
mailOn:
failure: true # send a mail when the step failed
success: true # send a mail when the step finished
continueOn:
failed: true # continue to the next regardless the step failed or not
skipped: true # continue to the next regardless the preconditions are met or not
retryPolicy: # retry policy for the step
limit: 2 # retry up to 2 times when the step failed
preconditions: # precondisions for whether the step is allowed to run
- condition: "`printf 1`" # command or variables to evaluate
expected: "1" # value to be expected to run the stepThe global config file ~/.dagu/config.yaml is useful to gather common settings such as log directory.
To check all examples, visit this page.
- Sample 1
name: example DAG
steps:
- name: "1"
command: echo hello world
- name: "2"
command: sleep 10
depends:
- "1"
- name: "3"
command: echo done!
depends:
- "2"- Sample 2
name: example DAG
env:
LOG_DIR: ${HOME}/logs
logDir: ${LOG_DIR}
params: foo bar
steps:
- name: "check precondition"
command: echo start
preconditions:
- condition: "`echo $1`"
expected: foo
- name: "print foo"
command: echo $1
depends:
- "check precondition"
- name: "print bar"
command: echo $2
depends:
- "print foo"
- name: "failure and continue"
command: "false"
continueOn:
failure: true
depends:
- "print bar"
- name: "print done"
command: echo done!
depends:
- "failure and continue"
handlerOn:
exit:
command: echo finished!
success:
command: echo success!
failure:
command: echo failed!
cancel:
command: echo canceled!- Complex example
name: complex DAG
steps:
- name: "Initialize"
command: "sleep 2"
- name: "Copy TAB_1"
description: "Extract data from TAB_1 to TAB_2"
command: "sleep 2"
depends:
- "Initialize"
- name: "Update TAB_2"
description: "Update TAB_2"
command: "sleep 2"
depends:
- Copy TAB_1
- name: Validate TAB_2
command: "sleep 2"
depends:
- "Update TAB_2"
- name: "Load TAB_3"
description: "Read data from files"
command: "sleep 2"
depends:
- Initialize
- name: "Update TAB_3"
command: "sleep 2"
depends:
- "Load TAB_3"
- name: Merge
command: "sleep 2"
depends:
- Update TAB_3
- Validate TAB_2
- Validate File
- name: "Check File"
command: "sleep 2"
- name: "Copy File"
command: "sleep 2"
depends:
- Check File
- name: "Validate File"
command: "sleep 2"
depends:
- Copy File
- name: Calc Result
command: "sleep 2"
depends:
- Merge
- name: "Report"
command: "sleep 2"
depends:
- Calc Result
- name: Reconcile
command: "sleep 2"
depends:
- Calc Result
- name: "Cleaning"
command: "sleep 2"
depends:
- ReconcileFeel free to contribute in any way you want. Share ideas, submit issues, create pull requests. You can start by improving this README.md or suggesting new features Thank you!
This project is licensed under the GNU GPLv3 - see the LICENSE.md file for details