Simplify/remove /persist/status/zedagent/*#5584
Merged
Merged
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5584 +/- ##
==========================================
+ Coverage 19.52% 28.34% +8.81%
==========================================
Files 19 18 -1
Lines 3021 2417 -604
==========================================
+ Hits 590 685 +95
+ Misses 2310 1588 -722
- Partials 121 144 +23 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
8cade80 to
adf8b8a
Compare
milan-zededa
approved these changes
Mar 24, 2026
Contributor
There was a problem hiding this comment.
Should we strip the "maybe" prefix from the function name now that we reload bootstrap config on every boot?
adf8b8a to
2ff38d6
Compare
c6b38e8 to
48efd8a
Compare
This is used as the startup config if we crash and don't have controller connectivity. It is saved after we have been running with an updated config for at least 10 minutes. Signed-off-by: eriknordmark <erik@zededa.com>
And its .bak file. Those are checkpointed protobuf files which have their chains verified before writing them, but also when they are read after a reboot. When we load from /persist/checkpoint/controllercerts we also publish ControllerCerts for use by others. This will remove the need to for /persist/certs/server-signing-cert.pem Signed-off-by: eriknordmark <erik@zededa.com>
Instead it is only kept in memory/pubsub and lookupControllerSigningCert can be used to fetch it. Signed-off-by: eriknordmark <erik@zededa.com>
The need for Touch went away when we started accepting arbitrarily old checkpoints. Signed-off-by: eriknordmark <erik@zededa.com>
And make sure we update the checkpoint when there are real changes to the controller certs. This requires comparing the set of Keys aka hashes of the certificates to avoid a falsely detection changes due to ordering differences in the protobuf bytes. Signed-off-by: eriknordmark <erik@zededa.com>
They are created from the checkpointed controllercerts and lastconfig when zedagent starts, and then they are published to other agents. Signed-off-by: eriknordmark <erik@zededa.com>
The ConfigItemValueMap will no longer be a persistent publications hence there will be no need to convert from old to new formats, nor set default values. The defaults will be applied by zedagent on startup. A follow-on commit adds back the handling of /config/GlobalConfig/global.json in zedagent. Signed-off-by: eriknordmark <erik@zededa.com>
Zedagent initializes it from /persist/checkpoint/lastconfig on startup so that other agents can get their global config. Signed-off-by: eriknordmark <erik@zededa.com>
Since some persistent publication are no longer persistent Signed-off-by: eriknordmark <erik@zededa.com>
And remove some use of file access in favor of pubsub calls. Signed-off-by: eriknordmark <erik@zededa.com>
But do this in zedagent similar to how it handles bootstrap config (the old code was in upgradeconverter). A follow-up commit removes the use of /persist/ingested and related sha256 comparisons. Signed-off-by: eriknordmark <erik@zededa.com>
This no longer makes sense since we ingest into memory and not into files in /persist. Only the DevicePortConfig gets ingested by nim into a persistent publication and that can handle multiple ingestions without indigestion. Signed-off-by: eriknordmark <erik@zededa.com>
If we have a /persist/checkpoint/lastconfig we ignore re-reading those files. They will have been taken into account in forming /persist/status/nim/DevicePortConfigList/ Signed-off-by: eriknordmark <erik@zededa.com>
48efd8a to
c815255
Compare
Since the load is no longer conditional on /persist/ingested/ Signed-off-by: eriknordmark <erik@zededa.com>
This was referenced Apr 6, 2026
7 tasks
7 tasks
This was referenced May 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
We currently have a checkpoint of the protobuf config in /persist/checkpoint/lastconfig, which is signed by the controller. Its signature is verified before it is used, and then it is used to populate zedagent's publications even if there is no connection to the controller.
Thus the persistent publications (with files in /persist/status/zedagent) should not be needed, and getting rid of them simplifies analyzing any security impact unauthorized modifications will have to such files.
However, we need to have the ControllerCerts available since the CipherContext (used for object encryption) depend on those. This is adddressed by introducing a new /persist/checkpoint/controllercerts which contains the protobuf objects received from the controller. This file is then verified at boot (the same way as when we receive an update - that the certificate chain verifies all the way to the cert in /config/root-certificate.pem)
Both lastconfig and controllercerts have a .bak file which should ensure that even if there is a power outage when the file is written we will have a valid backup which we can use.
Note that more of the publications in /persist/status/zedagent need to be addressed.
Next is ConfigItemValueMap which might have some chicken and egg problems at bootup; need to have that published based on the checkpoint lastconfig as the agents start.
How to test and validate this PR
Since we are touching code which relate to rolling the controller certificates that needs to be tested very carefully, including any corner case.
And since the purpose of the checkpoints are to allow the device (including datastore and WiFi credentials) and app instances (including with cloud-init) to boot even if there is no network, that needs to be carefully tested.
It is not clear whether we can test the corruption of /persist/checkpoint/lastconfig or /persist/checkpoint/controllercerts, but that is why we have the .bak files (to handle inopportune power outages while the checkpoint file(s) are updated.)
Changelog notes
Removed extra copies of checkpointed state from /persist/status/ and instead solely rely on the signed protobuf encoded checkpoint in /persist/checkpoint/lastconfig. This required also checkpointing the controller certs (in /persist/checkpoint/controllercerts), and providing a backup copy of both checkpoints. The backups are needed to handle both the case of inopportune power outages when the checkpoints are written, and the fact that the lastconfig depends on the controllercerts yet those two separate checkpoint files are not updated atomically.
PR Backports
Here is the list of current LTS branches (it should be always up to date):
Checklist
And the last but not least:
check them.
Please, check the boxes above after submitting the PR in interactive mode.