Skip to content

Fixup conversion of memory and cpu settings to support k8s resource request format#11725

Merged
kdelee merged 2 commits into
ansible:develfrom
kdelee:convert_k8s_mem
Feb 15, 2022
Merged

Fixup conversion of memory and cpu settings to support k8s resource request format#11725
kdelee merged 2 commits into
ansible:develfrom
kdelee:convert_k8s_mem

Conversation

@kdelee

@kdelee kdelee commented Feb 11, 2022

Copy link
Copy Markdown
Member
SUMMARY

This setting has not been getting set by default, and needed some fixing
up to be compatible with setting the memory in the same way as we set it
in the operator, as well as with other changes from last year which
assume that ansible runner is returning memory in bytes.

This way we can start setting this setting in the operator, and get a
more accurate reflection of how much memory is available to the control
pod in k8s.

Once this is in, I can set the memory in the operator and that should address the memory part of #11640

This is an alternative to #11038. @shanemcd and I both think using this setting may be a little more fool proof, as we don't have to worry about something changing in cgroups. Just keep it simple and take whatever we set on the resource request/limit for k8s.

ISSUE TYPE
  • Feature Pull Request
  • Bugfix Pull Request
COMPONENT NAME
  • API
AWX VERSION

ADDITIONAL INFORMATION

Its important to note that we were using this function and passing bytes as we get it from ansible runner

self.save_health_data(awx_application_version, get_cpu_count(), get_mem_in_bytes(), update_last_seen=True, errors=errors)
which ends up in

awx/awx/main/models/ha.py

Lines 225 to 264 in 6f9d4d8

def refresh_capacity_fields(self):
"""Update derived capacity fields from cpu and memory (no save)"""
self.cpu_capacity = get_cpu_effective_capacity(self.cpu)
self.mem_capacity = get_mem_effective_capacity(self.memory)
self.set_capacity_value()
def save_health_data(self, version, cpu, memory, uuid=None, update_last_seen=False, errors=''):
self.last_health_check = now()
update_fields = ['last_health_check']
if update_last_seen:
self.last_seen = self.last_health_check
update_fields.append('last_seen')
if uuid is not None and self.uuid != uuid:
if self.uuid is not None:
logger.warn(f'Self-reported uuid of {self.hostname} changed from {self.uuid} to {uuid}')
self.uuid = uuid
update_fields.append('uuid')
if self.version != version:
self.version = version
update_fields.append('version')
new_cpu = get_corrected_cpu(cpu)
if new_cpu != self.cpu:
self.cpu = new_cpu
update_fields.append('cpu')
new_memory = get_corrected_memory(memory)
if new_memory != self.memory:
self.memory = new_memory
update_fields.append('memory')
if not errors:
self.refresh_capacity_fields()
self.errors = ''
else:
self.mark_offline(perform_save=False, errors=errors)
update_fields.extend(['cpu_capacity', 'mem_capacity', 'capacity', 'errors'])

so I updated the variables to be more clearly named, as we are dealing in bytes not mb, and not gb. So this unifies all the places that look at this setting to assume its result is in bytes.

I've added a test to ensure we get the expected memory conversion and resulting forks.
See https://github.com/ansible/awx/blob/devel/docs/capacity.md#memory-relative-capacity for docs on the arithmetic there, but this basically translates into "Take memory in bytes, convert to megabytes, subtract 2048 megabytes, divide by setting that determines megabytes per fork".

Do we want to revisit that algorithm? Maybe but I'm just respecting whats there now.

@kdelee kdelee force-pushed the convert_k8s_mem branch 3 times, most recently from 81d04ac to 26b2b5e Compare February 11, 2022 18:32
@AlanCoding

Copy link
Copy Markdown
Member

in the operator, as well as with other changes from last year which
assume that ansible runner is returning memory in bytes.

So you're talking about the behavior of the operator, or of something from the past?

memory = models.BigIntegerField(
default=0,
editable=False,
help_text=_('Total system memory of this instance in bytes.'),

If I check my API locally I find

"memory": 33391198208,

which is ~31 Gb, not ~31 Mb (in which case I would not be able to type this in Chrome). So hopefully we're on the same page that we're consistent in units from runner to AWX, because it is documented along all that way. See runner's behavior:

bash-5.1$ ansible-runner worker --worker-info
{cpu_count: 8, errors: [], mem_in_bytes: 33391198208, runner_version: 2.1.1.dev24,
  uuid: 126cf42a-9bd1-408d-87d4-80fd0250d289}

I'm unclear exactly how the bug here is documented. What values are getting put in? We have the expectation set correctly, it should be an integer number of bytes. You're not getting that, somewhere, for some reason?

@kdelee

kdelee commented Feb 11, 2022

Copy link
Copy Markdown
Member Author

@AlanCoding So couple things going on here:

So, there is nothing wrong with how runner is returning memory -- I'm just updating it so if we set the env vars, we get the right behavior and everything is in bytes so it works the same as the value when we get it from runner instead of a setting.

@AlanCoding AlanCoding left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all depends on openshift, sure, but I don't have a problem with how it appears here.

@kdelee kdelee changed the title [draft] fixup conversion of memory setting to bytes [draft] fixup conversion of memory and cpu settings to support k8s resource request format Feb 11, 2022
@kdelee kdelee force-pushed the convert_k8s_mem branch 3 times, most recently from ed5a2bd to 9db07d9 Compare February 11, 2022 23:32
@shanemcd shanemcd changed the title [draft] fixup conversion of memory and cpu settings to support k8s resource request format Fixup conversion of memory and cpu settings to support k8s resource request format Feb 14, 2022
@kdelee

kdelee commented Feb 14, 2022

Copy link
Copy Markdown
Member Author

@shanemcd pushed new changes to migrate CPU to a decimal field and also to not deduct the 2GB memory penalty on k8s that we do when all the services are colocated

Here is a control pod on a minikube deployment with operator built from ansible/awx-operator#791

{
    "id": 3,
    "type": "instance",
    "url": "[/api/v2/instances/3/](http://192.168.49.2:31406/api/v2/instances/3/)",
    "related": {
        "named_url": "[/api/v2/instances/awx-7d6b447ccc-tqw9h/](http://192.168.49.2:31406/api/v2/instances/awx-7d6b447ccc-tqw9h/)",
        "jobs": "[/api/v2/instances/3/jobs/](http://192.168.49.2:31406/api/v2/instances/3/jobs/)",
        "instance_groups": "[/api/v2/instances/3/instance_groups/](http://192.168.49.2:31406/api/v2/instances/3/instance_groups/)",
        "health_check": "[/api/v2/instances/3/health_check/](http://192.168.49.2:31406/api/v2/instances/3/health_check/)"
    },
    "uuid": "cbfbfcca-55f6-4ca3-83fb-cd05cfba9089",
    "hostname": "awx-7d6b447ccc-tqw9h",
    "created": "2022-02-14T17:51:10.153995Z",
    "modified": "2022-02-14T17:51:10.154129Z",
    "last_seen": "2022-02-14T18:13:26.629703Z",
    "last_health_check": "2022-02-14T18:13:26.629703Z",
    "errors": "",
    "capacity_adjustment": "1.00",
    "version": "19.5.2.dev111+g9db07d9d7b.d20220214",
    "capacity": 10,
    "consumed_capacity": 0,
    "percent_capacity_remaining": 100.0,
    "jobs_running": 0,
    "jobs_total": 0,
    "cpu": "0.5",
    "memory": 1073741824,
    "cpu_capacity": 2,
    "mem_capacity": 10,
    "enabled": true,
    "managed_by_policy": true,
    "node_type": "control"
}

the request on the pod (this is default for awx-operator right now)

  awx-task:
....omitted...
    Requests:
      cpu:     500m
      memory:  1Gi

This setting has not been getting set by default, and needed some fixing
up to be compatible with setting the memory in the same way as we set it
in the operator, as well as with other changes from last year which
assume that ansible runner is returning memory in bytes.

This way we can start setting this setting in the operator, and get a
more accurate reflection of how much memory is available to the control
pod in k8s.

On platforms where services are all sharing memory, we deduct a
penalty from the memory available. On k8s we don't need to do this
because the web, redis, and task containers each have memory
allocated to them.
This setting has not been getting set by default, and needed some fixing
up to be compatible with setting the CPU resource request/limits in the
same way as we set it in the resource requests/limits.

This way we can start setting this setting in the
operator, and get a more accurate reflection of how much cpu is
available to the control pod in k8s.

Because cpu on k8s can be partial cores, migrate cpu field to decimal.

k8s does not allow granularity of less than 100m (equivalent to 0.1 cores), so only
store up to 1 decimal place.

fix analytics to deal with decimal cpu

need to use DjangoJSONEncoder when Decimal fields in data passed to
json.dumps
@kdelee kdelee merged commit 7999684 into ansible:devel Feb 15, 2022
@kdelee kdelee deleted the convert_k8s_mem branch February 15, 2022 19:08
slemrmartin pushed a commit to slemrmartin/awx that referenced this pull request Mar 30, 2022
…equest format (ansible#11725)

fix memory and cpu settings to suport k8s resource request format

* fix conversion of memory setting to bytes

This setting has not been getting set by default, and needed some fixing
up to be compatible with setting the memory in the same way as we set it
in the operator, as well as with other changes from last year which
assume that ansible runner is returning memory in bytes.

This way we can start setting this setting in the operator, and get a
more accurate reflection of how much memory is available to the control
pod in k8s.

On platforms where services are all sharing memory, we deduct a
penalty from the memory available. On k8s we don't need to do this
because the web, redis, and task containers each have memory
allocated to them.

* Support CPU setting expressed in units used by k8s

This setting has not been getting set by default, and needed some fixing
up to be compatible with setting the CPU resource request/limits in the
same way as we set it in the resource requests/limits.

This way we can start setting this setting in the
operator, and get a more accurate reflection of how much cpu is
available to the control pod in k8s.

Because cpu on k8s can be partial cores, migrate cpu field to decimal.

k8s does not allow granularity of less than 100m (equivalent to 0.1 cores), so only
store up to 1 decimal place.

fix analytics to deal with decimal cpu

need to use DjangoJSONEncoder when Decimal fields in data passed to
json.dumps
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants