A powerful Airflow plugin for clearing recently failed task instances with only a few clicks
The Big Red Button plugin provides a convenient web interface for viewing and clearing recently failed task instances across your Airflow DAGs. Perfect for those moments when you need to quickly recover from cascading failures or retry multiple tasks at once.
Clear all recently failed and upstream-failed task instances across multiple DAGs with a single click. No more manually clicking through individual task instances.
Filter DAGs by tags to selectively clear failures for specific groups of workflows. Great for managing different environments or teams.
Choose from multiple time windows to clear failures:
- 1 hour - Recent failures only
- 12 hours - Half-day failures
- 1 day - Daily failures
- 7 days - Weekly failures
View failure counts grouped by DAG before clearing. Know exactly what you're about to clear with a confirmation page showing all affected tasks.
Safety first! Every clearing operation requires explicit confirmation, preventing accidental mass deletions.
All clearing operations are logged to Airflow's audit log with details about who cleared what and when.
- Apache Airflow 2.0+ (not currently tested for Airflow 3.0+)
- Python 3.8-3.11
- Copy the plugin to your Airflow plugins directory:
# Copy the entire big_red_button folder to your Airflow plugins directory
cp -r plugins/big_red_button $AIRFLOW_HOME/plugins/- Restart your Airflow webserver:
# Restart the webserver to load the plugin
airflow webserver- Access the plugin:
Navigate to your Airflow UI and look for:
- "Big Red Button" in the main menu (tag-filtered view)
- "Big Red Button Admin" in the main menu (admin view for all DAGs)
- Navigate to "Big Red Button" in the Airflow UI
- Select tags to filter DAGs
- Choose a time window (1 hour, 12 hours, 1 day, or 7 days)
- Click "Clear Failed DAGs"
- Review the confirmation page showing all affected tasks
- Click "Confirm Clear" to execute the clearing operation
- Navigate to "Big Red Button Admin" in the Airflow UI
- Choose a time window (1 hour, 12 hours, 1 day, or 7 days)
- View failure counts for all DAGs
- Click "Clear All Failed DAGs" to proceed
- Review and confirm the clearing operation
From either view:
- Find the DAG with failures in the failure count table
- Click "Clear" next to the specific DAG
- Review the task-level details
- Confirm to clear only that DAG's failures
Perfect for teams managing multiple DAG groups. Filter by tags to see only the failures relevant to your team or environment.
Route: /bigredbuttonbaseview
Full administrative view showing failures across all DAGs without filtering. Ideal for platform administrators who need visibility into the entire Airflow instance.
Route: /bigredbuttonadminbaseview
The plugin uses the following default settings (defined in big_red_button.py):
# Time windows for clearing failures
clear_windows = {
"1_hour": timedelta(hours=1),
"12_hours": timedelta(hours=12),
"1_day": timedelta(days=1),
"7_days": timedelta(days=7),
}
# Number of tasks to clear per database query
PAGE_SIZE = 200These can be modified by editing the source file if needed.
- Two-step confirmation: Always shows what will be cleared before executing
- Audit logging: Every clearing operation is logged with user info
- Paginated clearing: Large batches are cleared in chunks to avoid database timeouts
- Read-only preview: Confirmation page doesn't execute any changes
- Python 3.8-3.11 (Airflow 2.x is not compatible with Python 3.12+)
# Setup virtual environment and install dependencies
make setup
# Run tests
make test
# Format code
make format
# Run tests with verbose output
make test-verbose
# Run tests with coverage
make test-coverage
# Clean up
make clean- Create a virtual environment:
python3 -m venv venv- Activate the virtual environment:
source venv/bin/activate # On macOS/Linux
# or
venv\Scripts\activate # On Windows- Install dependencies:
pip install -r requirements-dev.txtWith the virtual environment activated:
pytest tests/ -vOr run all tests:
pytestTo run tests with coverage:
pytest tests/ --cov=plugins/big_red_button --cov-report=term-missingbigredbutton/
βββ plugins/
β βββ big_red_button/
β βββ big_red_button.py # Main plugin code
β βββ templates/ # Flask templates
β βββ big_red_button.html
β βββ big_red_button_admin.html
β βββ clear_failed.html
βββ tests/
β βββ __init__.py
β βββ conftest.py # Pytest configuration
β βββ test_big_red_button.py # Unit tests
βββ requirements.txt # Runtime dependencies
βββ requirements-dev.txt # Development dependencies
βββ Makefile # Development commands
βββ README.md
The plugin integrates with Airflow's task clearing mechanism:
- Query: Finds all failed and upstream-failed task instances within the specified time window
- Filter: Optionally filters by DAG tags or specific DAG ID
- Group: Groups failures by DAG for easy visualization
- Clear: Uses Airflow's built-in
clear_task_instances()function in batches - Log: Records the operation to Airflow's audit log
Originally developed for managing task failures at scale in production Airflow environments.
Need to clear those failed tasks? The Big Red Button is here to help! π