Populate HostMetric.used_in_inventories with actual inventory counts#16436
Populate HostMetric.used_in_inventories with actual inventory counts#16436tll3r wants to merge 1 commit into
Conversation
The used_in_inventories field on HostMetric was introduced in PR ansible#13560 but the backend logic to populate it was never implemented, leaving the field permanently null since AAP 2.4. This adds: - HostMetricInventoryCountTask: counts distinct inventories per hostname by correlating Host.name with HostMetric.hostname - update_host_metric_inventory_counts: weekly scheduled task that invokes the count update (controlled via HOST_METRIC_INVENTORY_COUNTS_INTERVAL) - DISPATCHER_SCHEDULE entry so the task runs automatically every 4 hours (with an internal 7-day threshold guard matching the existing pattern) - HOST_METRIC_INVENTORY_COUNTS_LAST_TS registered in conf for persistence - awx-manage update_host_metric_inventory_counts: management command for on-demand execution - Functional tests covering zero, single, multiple inventories, mixed hosts, idempotency, and inventory membership changes Resolves the incomplete feature tracked in AWX PRs ansible#13560, ansible#13782, ansible#13999. Made-with: Cursor
📝 WalkthroughWalkthroughA new scheduled task is introduced to recalculate and populate Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@awx/main/tasks/host_metrics.py`:
- Around line 309-333: The current code materializes a huge counts dict and
issues two separate UPDATEs using hostname__in=counts.keys(), which is
unscalable and racy; replace both steps with a single set-based update that uses
a correlated subquery and Coalesce to compute inventory membership per
Host.hostname and write 0 when no match. Build inventory_count_subquery like the
existing inventory_count_subquery
(Host.objects.filter(name=OuterRef('hostname')).annotate(cnt=Count('inventory_id',
distinct=True)).values('cnt')), then perform one update:
HostMetric.objects.exclude(used_in_inventories=Coalesce(Subquery(inventory_count_subquery),
Value(0))).update(used_in_inventories=Coalesce(Subquery(inventory_count_subquery),
Value(0))) so you never materialize counts in Python and only update rows whose
value will actually change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 9a660f42-9b98-4453-9cca-6f263c052177
📒 Files selected for processing (5)
awx/main/conf.pyawx/main/management/commands/update_host_metric_inventory_counts.pyawx/main/tasks/host_metrics.pyawx/main/tests/functional/commands/test_update_host_metric_inventory_counts.pyawx/settings/defaults.py
| counts = dict( | ||
| Host.objects.values('name') | ||
| .annotate(cnt=Count('inventory_id', distinct=True)) | ||
| .values_list('name', 'cnt') | ||
| ) | ||
|
|
||
| rows = 0 | ||
|
|
||
| # Set hosts with inventory membership to their actual count (only where changed) | ||
| if counts: | ||
| inventory_count_subquery = ( | ||
| Host.objects.filter(name=OuterRef('hostname')) | ||
| .values('name') | ||
| .annotate(cnt=Count('inventory_id', distinct=True)) | ||
| .values('cnt') | ||
| ) | ||
| rows += HostMetric.objects.filter(hostname__in=counts.keys()).exclude( | ||
| used_in_inventories=Subquery(inventory_count_subquery) | ||
| ).update(used_in_inventories=Subquery(inventory_count_subquery)) | ||
|
|
||
| # Set hosts without any inventory membership to 0 (only where not already 0) | ||
| rows += HostMetric.objects.exclude(hostname__in=counts.keys()).exclude( | ||
| used_in_inventories=0 | ||
| ).update(used_in_inventories=0) | ||
|
|
There was a problem hiding this comment.
Avoid materializing full host-name counts into Python for updates.
counts = dict(...) plus hostname__in=counts.keys() does not scale for large datasets and can produce very large IN (...) clauses; it also introduces a stale key-set window across two UPDATEs. Use one correlated subquery with Coalesce(..., 0) for a single set-based update.
Proposed refactor
class HostMetricInventoryCountTask:
@@
`@staticmethod`
def update_counts():
- counts = dict(
- Host.objects.values('name')
- .annotate(cnt=Count('inventory_id', distinct=True))
- .values_list('name', 'cnt')
- )
-
- rows = 0
-
- # Set hosts with inventory membership to their actual count (only where changed)
- if counts:
- inventory_count_subquery = (
- Host.objects.filter(name=OuterRef('hostname'))
- .values('name')
- .annotate(cnt=Count('inventory_id', distinct=True))
- .values('cnt')
- )
- rows += HostMetric.objects.filter(hostname__in=counts.keys()).exclude(
- used_in_inventories=Subquery(inventory_count_subquery)
- ).update(used_in_inventories=Subquery(inventory_count_subquery))
-
- # Set hosts without any inventory membership to 0 (only where not already 0)
- rows += HostMetric.objects.exclude(hostname__in=counts.keys()).exclude(
- used_in_inventories=0
- ).update(used_in_inventories=0)
+ inventory_count_subquery = (
+ Host.objects.filter(name=OuterRef('hostname'))
+ .values('name')
+ .annotate(cnt=Count('inventory_id', distinct=True))
+ .values('cnt')[:1]
+ )
+ computed_count = Coalesce(Subquery(inventory_count_subquery), Value(0))
+
+ rows = HostMetric.objects.exclude(
+ used_in_inventories=computed_count
+ ).update(used_in_inventories=computed_count)
settings.HOST_METRIC_INVENTORY_COUNTS_LAST_TS = now()
logger.info(f"update_host_metric_inventory_counts: updated {rows} HostMetric records")
return rows📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| counts = dict( | |
| Host.objects.values('name') | |
| .annotate(cnt=Count('inventory_id', distinct=True)) | |
| .values_list('name', 'cnt') | |
| ) | |
| rows = 0 | |
| # Set hosts with inventory membership to their actual count (only where changed) | |
| if counts: | |
| inventory_count_subquery = ( | |
| Host.objects.filter(name=OuterRef('hostname')) | |
| .values('name') | |
| .annotate(cnt=Count('inventory_id', distinct=True)) | |
| .values('cnt') | |
| ) | |
| rows += HostMetric.objects.filter(hostname__in=counts.keys()).exclude( | |
| used_in_inventories=Subquery(inventory_count_subquery) | |
| ).update(used_in_inventories=Subquery(inventory_count_subquery)) | |
| # Set hosts without any inventory membership to 0 (only where not already 0) | |
| rows += HostMetric.objects.exclude(hostname__in=counts.keys()).exclude( | |
| used_in_inventories=0 | |
| ).update(used_in_inventories=0) | |
| inventory_count_subquery = ( | |
| Host.objects.filter(name=OuterRef('hostname')) | |
| .values('name') | |
| .annotate(cnt=Count('inventory_id', distinct=True)) | |
| .values('cnt')[:1] | |
| ) | |
| computed_count = Coalesce(Subquery(inventory_count_subquery), Value(0)) | |
| rows = HostMetric.objects.exclude( | |
| used_in_inventories=computed_count | |
| ).update(used_in_inventories=computed_count) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@awx/main/tasks/host_metrics.py` around lines 309 - 333, The current code
materializes a huge counts dict and issues two separate UPDATEs using
hostname__in=counts.keys(), which is unscalable and racy; replace both steps
with a single set-based update that uses a correlated subquery and Coalesce to
compute inventory membership per Host.hostname and write 0 when no match. Build
inventory_count_subquery like the existing inventory_count_subquery
(Host.objects.filter(name=OuterRef('hostname')).annotate(cnt=Count('inventory_id',
distinct=True)).values('cnt')), then perform one update:
HostMetric.objects.exclude(used_in_inventories=Coalesce(Subquery(inventory_count_subquery),
Value(0))).update(used_in_inventories=Coalesce(Subquery(inventory_count_subquery),
Value(0))) so you never materialize counts in Python and only update rows whose
value will actually change.
Summary
The
used_in_inventoriesfield onHostMetricwas introduced in PR #13560 (Feb 2023) but the backend logic to populate it was never implemented, leaving the field permanentlynullsince AAP 2.4. The UI column was subsequently removed in PR #13782 with the note "Revert this commit once the backend is ready."This PR adds the missing backend:
HostMetricInventoryCountTask: counts distinct inventories per hostname by correlatingHost.namewithHostMetric.hostnamevia a single bulkUPDATEwith a correlated subqueryupdate_host_metric_inventory_counts: scheduled task (dispatched every 4 hours, with an internal 7-day run-threshold guard matching the existinghost_metric_summary_monthlypattern)HOST_METRIC_INVENTORY_COUNTS_LAST_TS: persisted via the conf registry so the run-threshold survives restartsawx-manage update_host_metric_inventory_counts: management command for on-demand executionRelated Issues
HostMetricmodel withused_in_inventoriescolumnhost_metric_summary_monthly(does not populateused_in_inventories)Test plan
awx-manage update_host_metric_inventory_countspopulates the field correctlyGET /api/v2/host_metrics/?order_by=-used_in_inventoriesreturns correct non-null valuespytest awx/main/tests/functional/commands/test_update_host_metric_inventory_counts.pyMade with Cursor
Summary by CodeRabbit
Release Notes
New Features
Chores