Fixup conversion of memory and cpu settings to support k8s resource request format by kdelee · Pull Request #11725 · ansible/awx

kdelee · 2022-02-11T17:05:55Z

SUMMARY

This setting has not been getting set by default, and needed some fixing
up to be compatible with setting the memory in the same way as we set it
in the operator, as well as with other changes from last year which
assume that ansible runner is returning memory in bytes.

This way we can start setting this setting in the operator, and get a
more accurate reflection of how much memory is available to the control
pod in k8s.

Once this is in, I can set the memory in the operator and that should address the memory part of #11640

This is an alternative to #11038. @shanemcd and I both think using this setting may be a little more fool proof, as we don't have to worry about something changing in cgroups. Just keep it simple and take whatever we set on the resource request/limit for k8s.

ISSUE TYPE

Feature Pull Request
Bugfix Pull Request

COMPONENT NAME

API

AWX VERSION

ADDITIONAL INFORMATION

Its important to note that we were using this function and passing bytes as we get it from ansible runner

awx/awx/main/models/ha.py

Line 282 in 6f9d4d8

    
           self.save_health_data(awx_application_version, get_cpu_count(), get_mem_in_bytes(), update_last_seen=True, errors=errors)

which ends up in

awx/awx/main/models/ha.py

Lines 225 to 264 in 6f9d4d8

    
           def refresh_capacity_fields(self): 
        
               """Update derived capacity fields from cpu and memory (no save)""" 
        
               self.cpu_capacity = get_cpu_effective_capacity(self.cpu) 
        
               self.mem_capacity = get_mem_effective_capacity(self.memory) 
        
               self.set_capacity_value() 
        
           def save_health_data(self, version, cpu, memory, uuid=None, update_last_seen=False, errors=''): 
        
               self.last_health_check = now() 
        
               update_fields = ['last_health_check'] 
        
               if update_last_seen: 
        
                   self.last_seen = self.last_health_check 
        
                   update_fields.append('last_seen') 
        
               if uuid is not None and self.uuid != uuid: 
        
                   if self.uuid is not None: 
        
                       logger.warn(f'Self-reported uuid of {self.hostname} changed from {self.uuid} to {uuid}') 
        
                   self.uuid = uuid 
        
                   update_fields.append('uuid') 
        
               if self.version != version: 
        
                   self.version = version 
        
                   update_fields.append('version') 
        
               new_cpu = get_corrected_cpu(cpu) 
        
               if new_cpu != self.cpu: 
        
                   self.cpu = new_cpu 
        
                   update_fields.append('cpu') 
        
               new_memory = get_corrected_memory(memory) 
        
               if new_memory != self.memory: 
        
                   self.memory = new_memory 
        
                   update_fields.append('memory') 
        
               if not errors: 
        
                   self.refresh_capacity_fields() 
        
                   self.errors = '' 
        
               else: 
        
                   self.mark_offline(perform_save=False, errors=errors) 
        
               update_fields.extend(['cpu_capacity', 'mem_capacity', 'capacity', 'errors'])

so I updated the variables to be more clearly named, as we are dealing in bytes not mb, and not gb. So this unifies all the places that look at this setting to assume its result is in bytes.

I've added a test to ensure we get the expected memory conversion and resulting forks.
See https://github.com/ansible/awx/blob/devel/docs/capacity.md#memory-relative-capacity for docs on the arithmetic there, but this basically translates into "Take memory in bytes, convert to megabytes, subtract 2048 megabytes, divide by setting that determines megabytes per fork".

Do we want to revisit that algorithm? Maybe but I'm just respecting whats there now.

AlanCoding · 2022-02-11T18:35:29Z

in the operator, as well as with other changes from last year which
assume that ansible runner is returning memory in bytes.

So you're talking about the behavior of the operator, or of something from the past?

awx/awx/main/models/ha.py

Lines 89 to 92 in 2de5ffc

    
           memory = models.BigIntegerField( 
        
               default=0, 
        
               editable=False, 
        
               help_text=_('Total system memory of this instance in bytes.'),

If I check my API locally I find

"memory": 33391198208,

which is ~31 Gb, not ~31 Mb (in which case I would not be able to type this in Chrome). So hopefully we're on the same page that we're consistent in units from runner to AWX, because it is documented along all that way. See runner's behavior:

bash-5.1$ ansible-runner worker --worker-info
{cpu_count: 8, errors: [], mem_in_bytes: 33391198208, runner_version: 2.1.1.dev24,
  uuid: 126cf42a-9bd1-408d-87d4-80fd0250d289}

I'm unclear exactly how the bug here is documented. What values are getting put in? We have the expectation set correctly, it should be an integer number of bytes. You're not getting that, somewhere, for some reason?

kdelee · 2022-02-11T18:54:14Z

@AlanCoding So couple things going on here:

SYSTEM_TASK_ABS_MEM used be used by a now deprecated openshift installer
SYSTEM_TASK_ABS_MEM used to be assumed to be Gi (gigabytes)
now I want to assume SYSTEM_TASK_ABS_MEM is a string that conforms to k8s accepted ways of defining memory resource request/limit https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/#memory-units
with this PR, I can do set memory setting based on resource settings awx-operator#791
I agree that runner returns bytes, I just updated the local variable "memory_mb" to be "memory_bytes" to match that
I updated the awx/main/dispatch/pool.py to consider the memory in bytes instead of Gi, so we can use same conversion of SYSTEM_TASK_ABS_MEM to bytes as in the other place

So, there is nothing wrong with how runner is returning memory -- I'm just updating it so if we set the env vars, we get the right behavior and everything is in bytes so it works the same as the value when we get it from runner instead of a setting.

AlanCoding

all depends on openshift, sure, but I don't have a problem with how it appears here.

kdelee · 2022-02-14T18:18:59Z

@shanemcd pushed new changes to migrate CPU to a decimal field and also to not deduct the 2GB memory penalty on k8s that we do when all the services are colocated

Here is a control pod on a minikube deployment with operator built from ansible/awx-operator#791

{
    "id": 3,
    "type": "instance",
    "url": "[/api/v2/instances/3/](http://192.168.49.2:31406/api/v2/instances/3/)",
    "related": {
        "named_url": "[/api/v2/instances/awx-7d6b447ccc-tqw9h/](http://192.168.49.2:31406/api/v2/instances/awx-7d6b447ccc-tqw9h/)",
        "jobs": "[/api/v2/instances/3/jobs/](http://192.168.49.2:31406/api/v2/instances/3/jobs/)",
        "instance_groups": "[/api/v2/instances/3/instance_groups/](http://192.168.49.2:31406/api/v2/instances/3/instance_groups/)",
        "health_check": "[/api/v2/instances/3/health_check/](http://192.168.49.2:31406/api/v2/instances/3/health_check/)"
    },
    "uuid": "cbfbfcca-55f6-4ca3-83fb-cd05cfba9089",
    "hostname": "awx-7d6b447ccc-tqw9h",
    "created": "2022-02-14T17:51:10.153995Z",
    "modified": "2022-02-14T17:51:10.154129Z",
    "last_seen": "2022-02-14T18:13:26.629703Z",
    "last_health_check": "2022-02-14T18:13:26.629703Z",
    "errors": "",
    "capacity_adjustment": "1.00",
    "version": "19.5.2.dev111+g9db07d9d7b.d20220214",
    "capacity": 10,
    "consumed_capacity": 0,
    "percent_capacity_remaining": 100.0,
    "jobs_running": 0,
    "jobs_total": 0,
    "cpu": "0.5",
    "memory": 1073741824,
    "cpu_capacity": 2,
    "mem_capacity": 10,
    "enabled": true,
    "managed_by_policy": true,
    "node_type": "control"
}

the request on the pod (this is default for awx-operator right now)

  awx-task:
....omitted...
    Requests:
      cpu:     500m
      memory:  1Gi

This setting has not been getting set by default, and needed some fixing up to be compatible with setting the memory in the same way as we set it in the operator, as well as with other changes from last year which assume that ansible runner is returning memory in bytes. This way we can start setting this setting in the operator, and get a more accurate reflection of how much memory is available to the control pod in k8s. On platforms where services are all sharing memory, we deduct a penalty from the memory available. On k8s we don't need to do this because the web, redis, and task containers each have memory allocated to them.

This setting has not been getting set by default, and needed some fixing up to be compatible with setting the CPU resource request/limits in the same way as we set it in the resource requests/limits. This way we can start setting this setting in the operator, and get a more accurate reflection of how much cpu is available to the control pod in k8s. Because cpu on k8s can be partial cores, migrate cpu field to decimal. k8s does not allow granularity of less than 100m (equivalent to 0.1 cores), so only store up to 1 decimal place. fix analytics to deal with decimal cpu need to use DjangoJSONEncoder when Decimal fields in data passed to json.dumps

…equest format (ansible#11725) fix memory and cpu settings to suport k8s resource request format * fix conversion of memory setting to bytes This setting has not been getting set by default, and needed some fixing up to be compatible with setting the memory in the same way as we set it in the operator, as well as with other changes from last year which assume that ansible runner is returning memory in bytes. This way we can start setting this setting in the operator, and get a more accurate reflection of how much memory is available to the control pod in k8s. On platforms where services are all sharing memory, we deduct a penalty from the memory available. On k8s we don't need to do this because the web, redis, and task containers each have memory allocated to them. * Support CPU setting expressed in units used by k8s This setting has not been getting set by default, and needed some fixing up to be compatible with setting the CPU resource request/limits in the same way as we set it in the resource requests/limits. This way we can start setting this setting in the operator, and get a more accurate reflection of how much cpu is available to the control pod in k8s. Because cpu on k8s can be partial cores, migrate cpu field to decimal. k8s does not allow granularity of less than 100m (equivalent to 0.1 cores), so only store up to 1 decimal place. fix analytics to deal with decimal cpu need to use DjangoJSONEncoder when Decimal fields in data passed to json.dumps

kdelee requested review from AlanCoding and shanemcd February 11, 2022 17:06

github-actions Bot added the component:api label Feb 11, 2022

kdelee force-pushed the convert_k8s_mem branch 3 times, most recently from 81d04ac to 26b2b5e Compare February 11, 2022 18:32

AlanCoding approved these changes Feb 11, 2022

View reviewed changes

kdelee changed the title ~~[draft] fixup conversion of memory setting to bytes~~ [draft] fixup conversion of memory and cpu settings to support k8s resource request format Feb 11, 2022

kdelee mentioned this pull request Feb 11, 2022

set memory setting based on resource settings ansible/awx-operator#791

Merged

kdelee force-pushed the convert_k8s_mem branch 3 times, most recently from ed5a2bd to 9db07d9 Compare February 11, 2022 23:32

shanemcd approved these changes Feb 14, 2022

View reviewed changes

shanemcd changed the title ~~[draft] fixup conversion of memory and cpu settings to support k8s resource request format~~ Fixup conversion of memory and cpu settings to support k8s resource request format Feb 14, 2022

kdelee force-pushed the convert_k8s_mem branch from 9db07d9 to 809e929 Compare February 14, 2022 18:13

kdelee requested a review from shanemcd February 14, 2022 18:20

kdelee force-pushed the convert_k8s_mem branch from 809e929 to 07677b4 Compare February 14, 2022 18:28

kdelee added 2 commits February 15, 2022 09:33

kdelee force-pushed the convert_k8s_mem branch from 07677b4 to d40d517 Compare February 15, 2022 14:36

shanemcd approved these changes Feb 15, 2022

View reviewed changes

rebeccahhh approved these changes Feb 15, 2022

View reviewed changes

kdelee merged commit 7999684 into ansible:devel Feb 15, 2022

kdelee deleted the convert_k8s_mem branch February 15, 2022 19:08

kdelee mentioned this pull request Feb 23, 2022

Check cgroup memory limit when determining max queue workers #11038

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixup conversion of memory and cpu settings to support k8s resource request format#11725

Fixup conversion of memory and cpu settings to support k8s resource request format#11725
kdelee merged 2 commits into
ansible:develfrom
kdelee:convert_k8s_mem

kdelee commented Feb 11, 2022 •

edited

Loading

Uh oh!

AlanCoding commented Feb 11, 2022

Uh oh!

kdelee commented Feb 11, 2022 •

edited

Loading

Uh oh!

AlanCoding left a comment

Uh oh!

kdelee commented Feb 14, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	def refresh_capacity_fields(self):
	"""Update derived capacity fields from cpu and memory (no save)"""
	self.cpu_capacity = get_cpu_effective_capacity(self.cpu)
	self.mem_capacity = get_mem_effective_capacity(self.memory)
	self.set_capacity_value()

	def save_health_data(self, version, cpu, memory, uuid=None, update_last_seen=False, errors=''):
	self.last_health_check = now()
	update_fields = ['last_health_check']

	if update_last_seen:
	self.last_seen = self.last_health_check
	update_fields.append('last_seen')

	if uuid is not None and self.uuid != uuid:
	if self.uuid is not None:
	logger.warn(f'Self-reported uuid of {self.hostname} changed from {self.uuid} to {uuid}')
	self.uuid = uuid
	update_fields.append('uuid')

	if self.version != version:
	self.version = version
	update_fields.append('version')

	new_cpu = get_corrected_cpu(cpu)
	if new_cpu != self.cpu:
	self.cpu = new_cpu
	update_fields.append('cpu')

	new_memory = get_corrected_memory(memory)
	if new_memory != self.memory:
	self.memory = new_memory
	update_fields.append('memory')

	if not errors:
	self.refresh_capacity_fields()
	self.errors = ''
	else:
	self.mark_offline(perform_save=False, errors=errors)
	update_fields.extend(['cpu_capacity', 'mem_capacity', 'capacity', 'errors'])

Conversation

kdelee commented Feb 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

SUMMARY

ISSUE TYPE

COMPONENT NAME

AWX VERSION

ADDITIONAL INFORMATION

Uh oh!

AlanCoding commented Feb 11, 2022

Uh oh!

kdelee commented Feb 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlanCoding left a comment

Choose a reason for hiding this comment

Uh oh!

kdelee commented Feb 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kdelee commented Feb 11, 2022 •

edited

Loading

kdelee commented Feb 11, 2022 •

edited

Loading

kdelee commented Feb 14, 2022 •

edited

Loading