Use in-memory data for reaper waiting job logic, consolidate with heartbeat#16482
Use in-memory data for reaper waiting job logic, consolidate with heartbeat#16482AlanCoding wants to merge 5 commits into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR refactors orphaned job cleanup from synchronous inline code to background dispatcher tasks. It introduces new reaper tasks and a query helper, wires them through task-manager scheduling, updates lost-instance heartbeat handling, and adds comprehensive test coverage. ChangesOrphaned Job Reaping Refactoring
🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
linter that's all |
…round task flow Refactored fragmented reaping logic across 5 entry points into a single architecture: heartbeat now only handles instance state transitions (marking offline/deleting), then triggers the task manager to detect orphaned jobs; task manager uses already-loaded in-memory instance data and a fast .exists() check to detect orphaned jobs without expensive DB queries, queuing idempotent background tasks with on_duplicate='queue_one' if needed; and background tasks reap_orphaned_jobs() and reset_orphaned_waiting_jobs() run independently with no arguments to avoid worker saturation. Extracted shared get_orphaned_running_jobs_query() function to DRY up query logic, removed inline job reaping from heartbeat and task manager hot paths, and updated tests to verify the new pattern. Task manager cycle load is now minimal (O(n) memory iteration + fast existence check), eliminating the expensive periodic DB queries from PR ansible#16467. Assisted-By: Haiku
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@awx/main/dispatch/reaper.py`:
- Line 107: The condition is using the method object instead of calling it:
change the check from referencing job.is_container_group_task to calling
job.is_container_group_task() so the boolean result is evaluated; update the if
statement in reaper.py where the job variable (UnifiedJob) is tested to use
job.is_container_group_task() so container-group tasks are correctly skipped.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 8a05ee96-852c-4983-903f-c1096e377a4b
📒 Files selected for processing (4)
awx/main/dispatch/reaper.pyawx/main/scheduler/task_manager.pyawx/main/tasks/system.pyawx/main/tests/functional/test_dispatch.py
|
|
||
| logger.info('Reaping orphaned running jobs') | ||
| for job in orphaned_running: | ||
| if not job.is_container_group_task: |
There was a problem hiding this comment.
is_container_group_task is a method and must be called with parentheses.
Per context snippet 3, is_container_group_task() is defined as a method on UnifiedJob. The current code accesses it as a property, which will always be truthy (the method object itself), causing container-group tasks to never be skipped as intended.
🐛 Proposed fix
- if not job.is_container_group_task:
+ if not job.is_container_group_task():📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if not job.is_container_group_task: | |
| if not job.is_container_group_task(): |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@awx/main/dispatch/reaper.py` at line 107, The condition is using the method
object instead of calling it: change the check from referencing
job.is_container_group_task to calling job.is_container_group_task() so the
boolean result is evaluated; update the if statement in reaper.py where the job
variable (UnifiedJob) is tested to use job.is_container_group_task() so
container-group tasks are correctly skipped.
SUMMARY
Putting up a concrete implementation of my rambling review comment from @fosterseth patch
#16467
The heartbeat change here is slightly architectural. When a peer instance is marked offline, the jobs from that will not right away be reaped, as another tick of the task manager, and another background task is necessary. However, when a node goes down we are not in a rush to reap jobs from it, and we can get in trouble for doing it too fast. Plus, all of these have immediate re-scheduling happening so if the system isn't bogged down we can expect it to happen as fast as the system resources can carry it.
ISSUE TYPE
COMPONENT NAME
Summary by CodeRabbit
New Features
Bug Fixes
Tests