Skip to content

Conversation

@vinayakankugoyal
Copy link
Contributor

1. Why is this pull request needed and what does it do?

To run coredns as nonroot using the nonroot distroless base image.

2. Which issues (if any) are related?

3. Which documentation changes (if any) need to be made?

4. Does this introduce a backward incompatible change or deprecation?

No

@chrisohaver
Copy link
Member

(closed and reopened to re-trigger the failed circleci test)

@codecov-commenter
Copy link

Codecov Report

Merging #5969 (3796ef6) into master (93c57b6) will increase coverage by 1.82%.
The diff coverage is 55.42%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##           master    #5969      +/-   ##
==========================================
+ Coverage   55.70%   57.52%   +1.82%     
==========================================
  Files         224      246      +22     
  Lines       10016    16058    +6042     
==========================================
+ Hits         5579     9237    +3658     
- Misses       3978     6259    +2281     
- Partials      459      562     +103     
Impacted Files Coverage Δ
core/dnsserver/config.go 0.00% <ø> (ø)
core/dnsserver/register.go 13.79% <0.00%> (-4.19%) ⬇️
plugin/auto/auto.go 0.00% <0.00%> (ø)
plugin/auto/xfr.go 0.00% <0.00%> (ø)
plugin/auto/zone.go 78.94% <ø> (+0.68%) ⬆️
plugin/autopath/autopath.go 73.61% <ø> (-1.39%) ⬇️
plugin/azure/azure.go 10.61% <0.00%> (-0.57%) ⬇️
plugin/azure/setup.go 56.36% <0.00%> (-4.14%) ⬇️
plugin/backend_lookup.go 0.00% <0.00%> (ø)
plugin/bind/bind.go 50.00% <0.00%> (-50.00%) ⬇️
... and 80 more

... and 156 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Signed-off-by: Vinayak Goyal <vinaygo@google.com>
Copy link
Collaborator

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SuperQ
Copy link
Collaborator

SuperQ commented Mar 16, 2023

Weird, CircleCI is having issues.

@vinayakankugoyal
Copy link
Contributor Author

/retest

@chrisohaver
Copy link
Member

chrisohaver commented Mar 16, 2023

/retest

That doesn't work in this repo. The Circle CI failures are some kind of Circle CI infrastructure issue.

FYI, the error doesnt go away after a successful run... but in this case, we can see that he kubernetes ci tests are not present, so Circle CI failed to start them.

Please amend your commit and force push - that should re-run the tests.
Be aware, If the kubernetes ci tests do get run successfully - I think the old Circle CI failure will remain (I have observed this in some other recent PRs).

@chrisohaver
Copy link
Member

closing and re-opening to re-run circle ci tests.

@chrisohaver chrisohaver reopened this Mar 20, 2023
@chrisohaver chrisohaver merged commit d21537f into coredns:master Mar 20, 2023
@dmotte
Copy link

dmotte commented Sep 9, 2023

Guys, just to let you know: after upgrading, my CoreDNS Docker container broke because of this change.

The command I was using to run the container was: docker run ... -v $PWD/Corefile:/Corefile ... docker.io/coredns/coredns:latest, but after this change the working directory of the container has become /home/nonroot instead of /, so CoreDNS could not find the Corefile anymore and was falling back to the default whoami configuration silently.

Adding -conf /Corefile as CLI option solved my problem, but maybe it's worth to document this breaking change somewhere. Moreover, I know that the config file path should always be specified as a best practice, but IMHO falling back to the default config silently may not be the best option.

Not complaining tho 🙂 Just posting my experience here in the hope that it may be useful to someone

@polarathene
Copy link
Contributor

Adding -conf /Corefile as CLI option solved my problem

You could also use --workdir (working_dir in compose.yaml).

Although I agree with you, that seems like an oversight for a minor version bump.

@chrisohaver
Copy link
Member

This change is listed in the release notes. https://github.com/coredns/coredns/blob/master/notes/coredns-1.11.0.md

@polarathene
Copy link
Contributor

This change is listed in the release notes.

Yes, but it doesn't mention anything about the implicit WORKDIR change which was probably unintentional?


Likewise there was no discussion here about the breakage it introduced for those who already were using unprivileged ports and explicitly dropping the capability that is now mandatory for the binary to run regardless even if you don't bind to privileged ports.

According to the capabilities man page, the setcap approach is a "capability-dumb binary" which causes this problem (failing if the expected capability is not available). I am not a Go dev, but from what the man page says, it should be possible for CoreDNS to be smarter and run without setcap by using the API call to request the capability if binding to privileged ports? Then everyone should be happy and it's not just container specific? 🤷‍♂️

For example: https://github.com/ik5/gocapng

@chrisohaver
Copy link
Member

I think we should consider rolling this change back, since it's causing problems for people.

@polarathene
Copy link
Contributor

since it's causing problems for people

Yes, please be consistent with the binary 🙏

If you want to address the non-root capability concern, do so properly as my last comment suggests:

I am not a Go dev, but from what the man page says, it should be possible for CoreDNS to be smarter and run without setcap by using the API call to request the capability if binding to privileged ports? Then everyone should be happy and it's not just container specific? 🤷‍♂️

For example: https://github.com/ik5/gocapng

Please revert the WORKDIR as well, and communicate that clearly in release notes if possible as it was not previously.


I looked into this and documented justification extensively at the time via the PR: #6320

I know it is very verbose, but I am confident it's the better approach.

The only blocker was the kubernetes test failure, which I cannot afford to investigate. If you're able to implement the Go API instead, that would presumably fix it and benefit anyone wanting the same feature by default without the Docker image.

@dgl
Copy link

dgl commented Dec 15, 2023

If you want to address the non-root capability concern, do so properly as my last comment suggests:
[...]
For example: https://github.com/ik5/gocapng

That "proper" way clearly states the need to have setcap on the binary: https://github.com/ik5/gocapng?tab=readme-ov-file#building-requirements

That's why dropping setcap is likely breaking Kubernetes. (After waiting for 5 minutes CircleCI isn't loading the log, so I can't be sure...)

The correct fix using capabilities is to use ambient capabilities, but those aren't currently supported in Kubernetes: kubernetes/kubernetes#56374 (and likely won't be in the short term)...

Probably the better fix is to do similar to Docker does as you point out, and set net.ipv4.ip_local_port_range, however that isn't supported on some older kernels, particularly the one that RHEL 7 ships (which Kubernetes still supports -- for now: kubernetes/kubernetes#116799).

@polarathene
Copy link
Contributor

Apologies for the verbosity in response below, and my lack of k8s knowledge which probably shows below 😅

TL;DR: Better to switch back to scratch base image with root user as the default?

  • Non-root is still possible without setcap, just requires communicating to users how to configure for that explicitly (eg: net.ipv4.ip_unprivileged_port_start=0), should they require a privileged port internally. That is a better security policy and UX in my opinion.
  • The kernel checks from setcap also apply to running as root, which outputs an unhelpful error to those using root but dropping all capabilities by default.
  • setcap ironically also impacts users with non-root, that have a more strict security policy and want to drop the capability as they're satisfied with the internal container port binding to an unprivileged port, but due to the kernel check enforcing the capability requirement (even when it's not needed), this is not possible to support.

Original response (click to expand)

That "proper" way clearly states the need to have setcap on the binary

Thanks for clarifying that. I had a misunderstanding and for non-root you're right, but if switching back to how CoreDNS was distributed as an image before (which was fine IMO), the API can be used and setcap won't be necessary.

I don't think the minor convenience should be favored from using it for non-root. Instead that should be clearly documented for the security conscious that insist on running it as non-root.

I am of the impression that they're simply interested in doing it for the sake of it because they don't know any better and are overly cautious/paranoid, plus it's simpler for them than to be explicit about the security context needed. If they care enough about it, better to inform correct configuration via documentation instead.


https://man7.org/linux/man-pages/man7/capabilities.7.html

Safety checking for capability-dumb binaries

A capability-dumb binary is an application that has been marked
to have file capabilities, but has not been converted to use the
libcap(3) API
to manipulate its capabilities.

(In other words,
this is a traditional set-user-ID-root program that has been
switched to use file capabilities, but whose code has not been
modified to understand capabilities
.)

For such applications, the effective capability bit is set on the file, so that the file permitted capabilities are automatically enabled in the process effective set when executing the file.
The kernel recognizes a file which has the effective capability bit set as capability-dumb for the purpose of the check described here.

Relying on setcap for file permitted capabilities to enforce a kernel check on granting the required capabilities when running the process unprivileged. This is only relevant because the process is being run without root.

The API in that context cannot manage the capabilities in the same way at runtime, unless a parent process with privileges is running it and then manages privileges for the child process.

Services like Postfix commonly do this, but if you insist on running the container without root user, you're stuck with the capability-dumb approach via file attributes.

--

If the process did not obtain the full set of file permitted capabilities, then execve(2) fails with the error EPERM.
This prevents possible security risks that could arise when a capability-dumb application is executed with less privilege than it needs.
Note that, by definition, the application could not itself recognize this problem, since it does not employ the
libcap(3) API.

This error is not helpful as shown in #6320 you get: exec /coredns: operation not permitted, as opposed to if the process avoided the check and encountered the actual error related to lack of the required capability: Listen: listen tcp :53: bind: permission denied.

Prior to the introduction of setcap, distroless was not used, the final image was scratch with CoreDNS binary copied over (along with the ca-certificates). The risk of running the binary as root within the container was minimal.

Outside of containerization, you'd often run CoreDNS as root yes? It's not distributed with the same capability-dumb approach outside of the Docker image and worked fine prior to introducing that into the image. The only reason it was merged was a vague request to support running as non-root.

You could run as non-root and bind to an unprivileged port, no issues. It doesn't change the fact that the privileged port is being run from the host as root? If running a Docker container via a rootless daemon, then you could have root in the container and bind to :53 internally, but on the actual host you'd have to bind to an unprivileged port. No specific use-case was provided in where it was important to support :53 via a non-root user within the container alone.

A container exploit does not require root to break out, there's even less attack surface with scratch. If a user wants to run as non-root user all they need to do is set the UID+GID, while either way both can drop all capabilities by default and only enable those which are required.

For non-root user, the only reason the capability is required is to bind a port lower than 1024. This isn't an issue on Docker by default as I've illustrated in #6302 while elsewhere you should be capable of the same approach, setting the sysctl to lower the port binding. You're permitting the process to bind to any port as non-root regardless, so I'd be interested in what actual attack vector you're trying to protect against (even a past vulnerability example that could be relatable), otherwise it's security paranoid parroting 🤷‍♂️

You can apply the sysctl in k8s as well via security context, as you would the other related settings. It's far better to have that explicit config than rely on implicit config that is subject to change (perfect example with CoreDNS image already as changes have been introduced that were breaking for existing users without mention in the changelog).


That's why dropping setcap is likely breaking Kubernetes.

Yes, understood. It's because it's running unprivileged and relies on file attribute granted from root to permit the capability. Personally, I don't like how that also affects the binary when run as root, since being capability-dumb still incurs the early exit check with the unhelpful error.

The correct fix using capabilities is to use ambient capabilities, but those aren't currently supported in Kubernetes

Not available in Docker either AFAIK.

However both allow you to control the capabilities from the context of the container as a whole. Since only one process is run this seems adequate. Those that are security conscious and really care about the additional security practices should be comfortable with the explicit approach as described earlier.

As root, CoreDNS could drop all capabilities by default and only keep those that are necessary. Although I'm not sure how well that'd work due to the plugin system, they'd likewise have to express the capabilities they need. Alternatively, the capabilities can be checked, along with if running privileged or unprivileged, if CoreDNS should better communicate a failure and advise a user (eg: link to documentation, or suggest using setcap).


however that isn't supported on some older kernels, particularly the one that RHEL 7 ships (which Kubernetes still supports

That's almost a decade old release series. Is there known deployments of CoreDNS with k8s on RHEL 7 currently?

Is that really worth supporting, or just an unknown? RHEL 7 is presently in maintenance and soon to switch to extended support, but I imagine those unable to migrate to newer version of RHEL aren't as critical about security as they are other factors?

They are likely able to afford the expertise to maintain their own customizations to support their needs, or are comfortable remaining on an older CoreDNS release (that problem will affect them elsewhere in that same sense, so it's not adding any new friction)

A 3.x kernel is very old for running a containerized environment on. There's no cgroups v2 support, which is important for running Docker for some time IIRC, and I wouldn't be surprised if that applies to k8s. I really don't see RHEL 7 as a valid blocker? 🤷


UPDATE: Actually it looks like the main problem is just setting e (effective) on the binary with setcap. I think everyone can be happy if setcap just sets p (permitted) capability and uses API at runtime to raise the capability into the effective set (allowed since it's marked as permitted).

That would prevent the binary failing from a kernel check on effective capabilities being set too early (via setcap). The associated sysctl could additionally be checked to verify if it's actually necessary to add CAP_NET_BIND_SERVICE into the effective set, in addition to running unprivileged 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants