-
Notifications
You must be signed in to change notification settings - Fork 336
Add ostree checkout overwrite mode "--update" #1408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
cfbbcb7 to
aeb8f71
Compare
|
I haven't looked at your patches here in detail yet but just a quick coment: it seems pretty related to If this is about development though I find |
c382f1f to
b6af556
Compare
bash/ostree
Outdated
| local options_with_args=" | ||
| --from-file | ||
| --fsync | ||
| --replace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be --update
bash/ostree
Outdated
| __ostree_compreply_all_files | ||
| return 0 | ||
| ;; | ||
| --replace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and this
man/ostree-checkout.xml
Outdated
| become the destination checkout target. If COMMIT destination | ||
| already exists, command will error unless one of the | ||
| <option>--union</option>, <option>--union-add</option>, | ||
| <option>--union-identical</option> or <option>--replace</option> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and this.
man/ostree-checkout.xml
Outdated
| </varlistentry> | ||
|
|
||
| <varlistentry> | ||
| <term><option>--replace</option>="COMMIT"</term> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and this
src/ostree/ot-builtin-checkout.c
Outdated
| if (opt_update && (opt_union || opt_union_add || opt_union_identical)) | ||
| { | ||
| g_set_error (error, G_IO_ERROR, G_IO_ERROR_FAILED, | ||
| "Cannot combine --replace with --union, " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be --update too
b6af556 to
6f2a0ef
Compare
|
Pretty interesting stuff, thanks a lot for working on this! The code looks sane at a very quick glance, but I'm going to need a little bit of time to think about it. In particular I want to try to figure out if this applies to what we have today in But it'd be great if we can pin down the higher level use case again - do you see this as just a development thing or do you see it as potentially useful for production hotfixes? In the rpm-ostree livefs issue one concrete use case I had was being able to hotfix OpenSSH, which is a classic example of a service that is pretty safe to hotfix. |
This is just intended as supporting work for #1419. The motivation for that is faster deploys with large numbers of files on a slow filesystem. Currently it takes 1m40s to do a deploy and a reboot takes about 10s. This is too slow - we want to be able to deploy a new version of software to our devices fast enough that our customers don't notice any outage. In particular we want to add some more containers which increases the number of files. This will make it even slower. We can't do a deploy while our customers are using the systems because doing the checkout monopolises the disk. As a side effect it makes development faster too. I'll expand on how our build/deploy system works a little later, when I have time... Note: I'm not live-updating the current deployment with this command. This isn't a use case I'd considered, but I can see how it could work out well. I'll expand on this on #1419. I've made this a separate pull-request because I think it's easier to review that way and easier to test in isolation than bundled with #1419. |
|
I definitely prefer your approach of recording the tree SHAs in xattrs as suggested in #1224. It's simpler and less error-prone to passing the xattrs in to the In #1419 I introduced the concept of the deploy cache which stored old deploys named by the commit id. The commit id would be retrieved from the directory name when doing a new deploy. This would be passed to this This xattrs approach is better for a few reasons:
I'm not saying that live updates would be a good idea, but they'd certainly be possible under this scheme. I'll have a think about what it would take to implement that. |
|
As a general request; can you file an issue first for bigger features so we can sketch out the design/use cases a bit before we do the code? Getting big patches is obviously great, not going to complain 😉! So...are you thinking it makes sense to do #1224 first? I can probably take a crack at that sometime soon myself. |
Yeah, I agree that this is ideally how it would work. In this case time pressures have meant that we don't always have time for this. I spent probably 2-3 days on this. We're already shipping this and #1419 on our product. It was a prerequisite for other important customer-facing work (https://stb-tester.com/blog/2018/02/06/stb-tester-v28-released) As such I'm not precious about changing it to get it merged properly so I don't think we've lost much with this approach in this instance. I won't be upset if you ask for changes or flat out say no :). The other element is that I didn't want to propose it until I knew I could do it. I had a stab at implementing this over a year ago - but failed. Your commit e6f17b9 made it much easier for me to understand the code, because it removed a layer of abstraction. So when I had another go and succeeded I raised this PR. I wouldn't have felt good about coming along and starting a discussion about a new feature that I wanted if I felt like I'd be "demanding" that others implement it for me,
I think the only thing that would change in this PR if #1224 were implemented is where the initial tree sha would come from. So changes here would depend only on a small subset of #1224 - specifically recording the SHAs in the xattrs. How do you see this working? The two questions in my mind are:
Thoughts? |
|
I'm going away on holiday so I'm not going to be able to look at this until March. I'd still be interested in your thoughts in the mean time, I'm just letting you know that I won't be doing anything about them for a little while. |
I get the development case for sure. But I want to understand better how you envision using this for production. I reread through the thread and your patches, but some important details aren't yet clear to me. Let's say we have two deployments, current and rollback (the ostree default). Normally when we do an upgrade, ostree will end up GCing the rollback but only after an upgrade has been completed - in the bootloader order we always have 2, but on disk we will transiently have 3 checkouts. However, we clearly could, on upgrade, immediately "take ownership" of the rollback. Is something like that what you're thinking? I see the "deploy cache" in later patches. The downside of this is that the upgrade is less "transactional" in that the rollback will vanish if we're interrupted. Today we don't have Am I right that your approach is about taking control over the rollback? |
|
Can one of the admins verify this patch? |
|
Some other notes I have: The xattr diffing is an enhancement but I don't see a need to require it; a valid concern you have is eMMC wear, and (AIUI) the extra reads we do as part of a full diff traverse aren't a problem for that We should be able to rebase and land some of the preparatory patches now.
This is definitely a concern - how we handle the ostree 3-way merge. Are you using systemd today? We need to think about how this feature intersects with staged deployments as I'd really like to make that the default in the future. What filesystem are you using? Does it have reflinks? That helps a lot for |
Yes, so in #1419 I've modified the GC code to move unused deploys to the cache rather than deleting it: This is a lot faster than recursively deleting the deploy as it just involves a
No, we take control over the last GCed deploy. |
This is in preparation for adding `OSTREE_REPO_CHECKOUT_OVERWRITE_UPDATE` where we will need to iterate through two directories simultaneously.
This is in preparation for adding `OSTREE_REPO_CHECKOUT_OVERWRITE_UPDATE` where we will need to iterate through two directories simultaneously.
Out of date since e6f17b9.
Used for updating an already checked out tree to a new version. This is intended to make deploys much faster in the future. In default operation `checkout_tree_at` now is doing an update from an empty tree to the given one. This implements diffs for directories. The next commit will introduce the same for files.
The goal is to do ostree deploys faster by updating already checked out trees. This commit continues the work in the previous commit. `checkout_tree_at` now takes an `orig_dirtree_checksum` telling it to assume that the directory it's checking out into already has this content.
The goal is to do ostree deploys faster by updating already checked out trees. This adds an overwrite mode and a new option to specify the original tree. It will do a lot less work when you're updating a tree with a large number of files but only a few files have changed. So if you have a full distro tree checked out and want to update `/usr/bin/ostree` we no longer have to recurse through every file in `/usr/share/doc` to do the update.
This exposes the OSTREE_REPO_CHECKOUT_OVERWRITE_UPDATE functionality added in the previous commit on the command line.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: wmanley The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Rebased on master |
|
Can you split off some of the prep patches here to a separate PR? |
|
@wmanley: PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@wmanley: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
will assume that
DESTINATIONalready contains a checkout ofFROM_COMMITand will attempt to do the least amount of work to make it containTO_COMMIT.The intention is to enhance
ostree admin deployto update previous deployments rather than checking out whole new ones. This should be faster. Currently on our system with slow eMMC disk it takes 1m40s to do a deploy (~150000 files). Ideally if only one or two files have changed it should be nearly instantaneous - suitable for interactive development. This should also reduce eMMC wear. This can happen in future PRs though.TODO: