Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 21 additions & 3 deletions awx/main/models/schedules.py
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,11 @@ def get_zoneinfo_links(cls):
def timezone(self):
utc = tzutc()
# All rules in a ruleset will have the same dtstart so we can just take the first rule
tzinfo = Schedule.rrulestr(self.rrule)._rrule[0]._dtstart.tzinfo
try:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing root-cause catch permanently disables valid schedules

High Severity

The PR description states the fix "catches ValueError in _fast_forward_rrule and falls back to the original rrule," but _fast_forward_rrule has no such catch around rrule.replace() at line 100. Without it, the ValueError propagates to the new handlers in awx_periodic_scheduler, which set next_run=None via direct DB update. Since next_run=None never matches the before() or between() query filters, the schedule is permanently disabled — even after DST ends and the fast-forward would succeed again. Because fast-forwarding is purely a performance optimization, catching ValueError in _fast_forward_rrule and returning the original rrule would let valid schedules keep running instead of being irreversibly killed.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 719aeff. Configure here.

tzinfo = Schedule.rrulestr(self.rrule)._rrule[0]._dtstart.tzinfo
except ValueError as e:
logger.warning("Schedule id=%s has an invalid rrule, cannot determine timezone: %s", self.id, e)
return ''
if tzinfo is utc:
return 'UTC'
all_zones = Schedule.get_zoneinfo()
Expand All @@ -192,7 +196,12 @@ def timezone(self):
def until(self):
# The UNTIL= datestamp (if any) coerced from UTC to the local naive time
# of the DTSTART
for r in Schedule.rrulestr(self.rrule)._rrule:
try:
rrules = Schedule.rrulestr(self.rrule)._rrule
except ValueError as e:
logger.warning("Schedule id=%s has an invalid rrule, cannot determine until: %s", self.id, e)
return ''
for r in rrules:
if r._until:
local_until = r._until.astimezone(r._dtstart.tzinfo)
naive_until = local_until.replace(tzinfo=None)
Expand Down Expand Up @@ -317,7 +326,16 @@ def update_computed_fields_no_save(self):
for field_name in affects_fields:
starting_values[field_name] = getattr(self, field_name)

future_rs = Schedule.rrulestr(self.rrule)
try:
future_rs = Schedule.rrulestr(self.rrule)
except ValueError as e:
logger.error(
"Schedule id=%s has an invalid rrule that cannot be parsed: %s. Error: %s",
self.id,
self.rrule,
e,
)
raise
Comment on lines +329 to +338

@coderabbitai coderabbitai Bot Apr 10, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Persisted-state gap: next_run/dtend are cleared in memory but not saved before re-raise.

At Line 340, re-raising exits before any save, so scheduler catch-and-continue paths can leave stale DB values and repeatedly re-select the same broken schedule.

Proposed fix
 def update_computed_fields(self):
-    changed = self.update_computed_fields_no_save()
+    try:
+        changed = self.update_computed_fields_no_save()
+    except ValueError:
+        # Fields were nulled in update_computed_fields_no_save(); persist that state.
+        super(Schedule, self).save(update_fields=['next_run', 'dtend', 'modified'])
+        raise
     if not changed:
         return
     emit_channel_notification('schedules-changed', dict(id=self.id, group_name='schedules'))

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@awx/main/models/schedules.py` around lines 329 - 340, The except block for
Schedule.rrulestr currently clears in-memory self.next_run and self.dtend but
re-raises before persisting; update the handler in the Schedule class to set
self.next_run = None and self.dtend = None and then persist those changes (e.g.,
call self.save() or self.save(update_fields=['next_run','dtend'])) before
re-raising the ValueError so the DB state reflects the cleared values and the
broken schedule won't be repeatedly reselected; ensure you still re-raise the
original exception after the save.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...seems valid, unfortunately.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlanCoding, glad the concern checks out. Here's a clean fix plan:

  • Catch ValueError in update_computed_fields after update_computed_fields_no_save sets the fields to None, persist those cleared values, then re-raise so callers (scheduler) still see the error but the DB is consistent.

Failed to handle agent chat message. Please try again.

Comment thread
cursor[bot] marked this conversation as resolved.

if self.enabled:
next_run_actual = future_rs.after(now())
Expand Down
13 changes: 11 additions & 2 deletions awx/main/tasks/system.py
Original file line number Diff line number Diff line change
Expand Up @@ -864,7 +864,7 @@


@task(queue=get_task_queuename, timeout=3600 * 5, on_duplicate='discard')
def awx_periodic_scheduler():

Check failure on line 867 in awx/main/tasks/system.py

View check run for this annotation

SonarQubeCloud / SonarCloud Code Analysis

Refactor this function to reduce its Cognitive Complexity from 16 to the 15 allowed.

See more on https://sonarcloud.io/project/issues?id=ansible_awx&issues=AZ1zrY1VCDKcginbbeEm&open=AZ1zrY1VCDKcginbbeEm&pullRequest=16400
lock_session_timeout_milliseconds = settings.TASK_MANAGER_LOCK_TIMEOUT * 1000
with advisory_lock('awx_periodic_scheduler_lock', lock_session_timeout_milliseconds=lock_session_timeout_milliseconds, wait=False) as acquired:
if acquired is False:
Expand All @@ -881,7 +881,11 @@

old_schedules = Schedule.objects.enabled().before(last_run)
for schedule in old_schedules:
schedule.update_computed_fields()
try:
schedule.update_computed_fields()
except Exception:
logger.exception("Failed to update computed fields for schedule %s.", schedule)
Schedule.objects.filter(pk=schedule.pk).update(next_run=None, dtend=None)
schedules = Schedule.objects.enabled().between(last_run, run_now)

invalid_license = False
Expand All @@ -892,7 +896,12 @@

for schedule in schedules:
template = schedule.unified_job_template
schedule.update_computed_fields() # To update next_run timestamp.
try:
schedule.update_computed_fields() # To update next_run timestamp.
except Exception:
logger.exception("Failed to update computed fields for schedule %s.", schedule)
Schedule.objects.filter(pk=schedule.pk).update(next_run=None, dtend=None)
continue
if template.cache_timeout_blocked:
logger.warning("Cache timeout is in the future, bypassing schedule for template %s" % str(template.id))
continue
Expand Down
89 changes: 89 additions & 0 deletions awx/main/tests/functional/api/test_schedules.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import datetime
import pytest
from unittest import mock

from django.utils.encoding import smart_str
from django.utils.timezone import now
Expand All @@ -10,6 +11,22 @@

RRULE_EXAMPLE = 'DTSTART:20151117T050000Z RRULE:FREQ=DAILY;INTERVAL=1;COUNT=1'

# This rrule triggers ValueError in _fast_forward_rrule when the dtstart is in
# the past and the fast-forwarded dtstart conflicts with the BYxxx constraints.
# The DTSTART hour (13 = 1 mod 4) aligns with all BYHOUR values (1,5,9,13,17,21
# are all 1 mod 4) during EST (UTC-5). But during EDT (UTC-4), the 1-hour shift
# causes the fast-forwarded local hour to be 0 mod 4 (e.g., 10), which has no
# overlap with BYHOUR, so dateutil raises "Invalid rrule byxxx generates an
# empty set."
RRULE_INVALID_BYXXX = 'DTSTART;TZID=America/New_York:20251211T130000 RRULE:FREQ=HOURLY;INTERVAL=4;WKST=MO;BYDAY=MO,TU,WE,TH,FR;BYHOUR=1,5,9,13,17,21;BYMINUTE=0'

# Same constraints but with a future dtstart, used to test the pathway where
# a schedule is created when dtstart is in the future (fast-forward skipped)
# and only breaks later when time passes and fast-forward kicks in during EDT.
RRULE_INVALID_BYXXX_FUTURE = (
'DTSTART;TZID=America/New_York:20351211T130000 RRULE:FREQ=HOURLY;INTERVAL=4;WKST=MO;BYDAY=MO,TU,WE,TH,FR;BYHOUR=1,5,9,13,17,21;BYMINUTE=0'
)


def get_rrule(tz=None):
parts = ['DTSTART']
Expand Down Expand Up @@ -593,3 +610,75 @@ def test_normal_user_can_create_inventory_update_schedule(options, post, invento
inventory_source.inventory.update_role.members.add(alice)
assert 'POST' in options(url, user=alice).data['actions'].keys()
post(url, params, alice, expect=201)


@pytest.mark.django_db
def test_patch_invalid_byxxx_rejected_during_edt(post, patch, admin_user, project, inventory):
"""PATCH with the problematic rrule during EDT is rejected by validation.
The fast-forward shifts the hour out of BYHOUR alignment, causing
ValueError which validate_rrule catches and returns as a 400.
"""
job_template = JobTemplate.objects.create(name='test-jt', project=project, playbook='helloworld.yml', inventory=inventory)
url = reverse('api:job_template_schedules_list', kwargs={'pk': job_template.id})
r = post(url, {'name': 'test-schedule', 'rrule': RRULE_EXAMPLE}, admin_user, expect=201)
detail_url = reverse('api:schedule_detail', kwargs={'pk': r.data['id']})

# During EDT (Jul 1, 2026), fast-forward lands on hour 10 (0 mod 4),
# no overlap with BYHOUR=1,5,9,13,17,21 → ValueError → 400
edt_now = datetime.datetime(2026, 7, 1, 14, 0, 0, tzinfo=datetime.timezone.utc)
with mock.patch('awx.main.models.schedules.now', return_value=edt_now):
r = patch(detail_url, {'rrule': RRULE_INVALID_BYXXX}, admin_user, expect=400)
assert 'rrule' in r.data


@pytest.mark.django_db
def test_get_survives_invalid_byxxx_already_in_db(get, admin_user, project, inventory):
"""GET on a schedule whose rrule was accepted during EST but now fails
fast-forward during EDT should not 500. The timezone and until properties
catch the ValueError and return fallback values.
"""
job_template = JobTemplate.objects.create(name='test-jt', project=project, playbook='helloworld.yml', inventory=inventory)

# Create a valid schedule, then inject the bad rrule at the DB level
# (simulates data accepted during EST that breaks during EDT)
schedule = Schedule.objects.create(
name='bad-byxxx-schedule',
rrule='DTSTART:20251211T130000Z RRULE:FREQ=DAILY;INTERVAL=1;COUNT=1',
unified_job_template=job_template,
)
Schedule.objects.filter(pk=schedule.pk).update(rrule=RRULE_INVALID_BYXXX)

# GET during EDT — timezone/until properties catch ValueError gracefully
edt_now = datetime.datetime(2026, 7, 1, 14, 0, 0, tzinfo=datetime.timezone.utc)
with mock.patch('awx.main.models.schedules.now', return_value=edt_now):
url = reverse('api:schedule_list')
r = get(url, admin_user, expect=200)
assert r.data['count'] >= 1

detail_url = reverse('api:schedule_detail', kwargs={'pk': schedule.pk})
r = get(detail_url, admin_user, expect=200)
assert r.data['id'] == schedule.pk


@pytest.mark.django_db
def test_future_dtstart_created_and_get_survives_edt(post, get, admin_user, project, inventory):
"""A schedule with conflicting BYxxx + BYHOUR constraints passes API
validation when dtstart is in the future (fast-forward is skipped entirely).
After dtstart passes, GET during EDT should not crash — the timezone and
until properties catch the ValueError from the DST hour shift.
"""
job_template = JobTemplate.objects.create(name='test-jt', project=project, playbook='helloworld.yml', inventory=inventory)
url = reverse('api:job_template_schedules_list', kwargs={'pk': job_template.id})

# Create with future dtstart — fast-forward is skipped, validation passes
r = post(url, {'name': 'future-schedule', 'rrule': RRULE_INVALID_BYXXX_FUTURE}, admin_user, expect=201)
schedule_id = r.data['id']

# GET during EDT when dtstart is now in the past — fast-forward kicks in
# and the DST shift breaks the BYHOUR alignment, but timezone/until
# properties catch it gracefully
edt_now = datetime.datetime(2036, 7, 1, 14, 0, 0, tzinfo=datetime.timezone.utc)
detail_url = reverse('api:schedule_detail', kwargs={'pk': schedule_id})
with mock.patch('awx.main.models.schedules.now', return_value=edt_now):
r = get(detail_url, admin_user, expect=200)
assert r.data['id'] == schedule_id
76 changes: 74 additions & 2 deletions awx/main/tests/functional/tasks/test_tasks_system.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,15 @@

import pytest

from awx.main.tasks.system import CleanupImagesAndFiles, execution_node_health_check, inspect_established_receptor_connections, clear_setting_cache
from awx.main.tasks.system import (
CleanupImagesAndFiles,
awx_periodic_scheduler,
execution_node_health_check,
inspect_established_receptor_connections,
clear_setting_cache,
)
from awx.main.management.commands.dispatcherd import Command
from awx.main.models import Instance, Job, ReceptorAddress, InstanceLink
from awx.main.models import Instance, Job, JobTemplate, ReceptorAddress, InstanceLink, Schedule, TowerScheduleState


@pytest.mark.django_db
Expand Down Expand Up @@ -164,3 +170,69 @@ def test_configure_dispatcher_logging_updates_level(settings):

assert logging.getLogger('dispatcherd').level == logging.WARNING
settings.LOGGING = original_logging_settings


@pytest.mark.django_db
def test_periodic_scheduler_survives_invalid_schedule(inventory, project):
"""A schedule whose update_computed_fields raises should not prevent the
periodic scheduler from processing other healthy schedules.

Simulates the scenario where a corrupt rrule causes update_computed_fields
to raise ValueError (the exact error from the BYHOUR/DST bug). Verifies
the scheduler completes, creates a job for the healthy schedule, and skips
the broken one.
"""
from datetime import datetime, timedelta, timezone as dt_tz

jt = JobTemplate.objects.create(name='test-jt', project=project, playbook='helloworld.yml', inventory=inventory)

run_now = datetime(2026, 7, 1, 14, 0, 0, tzinfo=dt_tz.utc)
last_run = run_now - timedelta(seconds=30)

healthy_schedule = Schedule.objects.create(
name='healthy-schedule',
rrule='DTSTART:20260101T120000Z RRULE:FREQ=DAILY;INTERVAL=1',
unified_job_template=jt,
)
Schedule.objects.filter(pk=healthy_schedule.pk).update(next_run=last_run + timedelta(seconds=10))

bad_schedule = Schedule.objects.create(
name='bad-schedule',
rrule='DTSTART:20260101T120000Z RRULE:FREQ=DAILY;INTERVAL=1',
unified_job_template=jt,
)
bad_schedule_id = bad_schedule.pk
Schedule.objects.filter(pk=bad_schedule_id).update(next_run=last_run + timedelta(seconds=10))

state = TowerScheduleState.get_solo()
state.schedule_last_run = last_run
state.save()

# Make update_computed_fields raise for the bad schedule, simulating the
# ValueError that occurs when _fast_forward_rrule hits a corrupt rrule.
original_ucf = Schedule.update_computed_fields
bad_schedule_raised = False

def update_or_raise(self):
nonlocal bad_schedule_raised
if self.pk == bad_schedule_id:
bad_schedule_raised = True
raise ValueError("Invalid rrule byxxx generates an empty set.")
return original_ucf(self)

with (
mock.patch('awx.main.tasks.system.now', return_value=run_now),
mock.patch('awx.main.models.schedules.now', return_value=run_now),
mock.patch('awx.main.models.schedules.emit_channel_notification'),
mock.patch('awx.main.tasks.system.emit_channel_notification'),
mock.patch.object(Schedule, 'update_computed_fields', update_or_raise),
):
awx_periodic_scheduler()

assert bad_schedule_raised, "The bad schedule's update_computed_fields should have been called and raised"

healthy_jobs = Job.objects.filter(schedule=healthy_schedule)
assert healthy_jobs.count() == 1, "Healthy schedule should have created exactly one job"

bad_jobs = Job.objects.filter(schedule=bad_schedule)
assert bad_jobs.count() == 0, "Bad schedule should not have created a job"
Loading