Skip to content

Conversation

@wmanley
Copy link
Member

@wmanley wmanley commented Jan 11, 2018

ostree checkout --update=FROM_COMMIT TO_COMMIT DESTINATION

will assume that DESTINATION already contains a checkout of FROM_COMMIT and will attempt to do the least amount of work to make it contain TO_COMMIT.

The intention is to enhance ostree admin deploy to update previous deployments rather than checking out whole new ones. This should be faster. Currently on our system with slow eMMC disk it takes 1m40s to do a deploy (~150000 files). Ideally if only one or two files have changed it should be nearly instantaneous - suitable for interactive development. This should also reduce eMMC wear. This can happen in future PRs though.

TODO:

  • Write a test so we're sure it's not touching directories that are unmodified.
  • Decide how subtree checkouts should work

@wmanley wmanley force-pushed the checkout-update branch 4 times, most recently from cfbbcb7 to aeb8f71 Compare January 11, 2018 12:30
@cgwalters
Copy link
Member

cgwalters commented Jan 11, 2018

I haven't looked at your patches here in detail yet but just a quick coment: it seems pretty related to rpm-ostree livefs work; see also:
#1224

If this is about development though I find ostree admin unlock to be super useful. I could certainly imagine teaching libostree how to do "live deployment updates" on top of the overlayfs.

bash/ostree Outdated
local options_with_args="
--from-file
--fsync
--replace
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be --update

bash/ostree Outdated
__ostree_compreply_all_files
return 0
;;
--replace)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this

become the destination checkout target. If COMMIT destination
already exists, command will error unless one of the
<option>--union</option>, <option>--union-add</option>,
<option>--union-identical</option> or <option>--replace</option>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this.

</varlistentry>

<varlistentry>
<term><option>--replace</option>="COMMIT"</term>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this

if (opt_update && (opt_union || opt_union_add || opt_union_identical))
{
g_set_error (error, G_IO_ERROR, G_IO_ERROR_FAILED,
"Cannot combine --replace with --union, "
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be --update too

@cgwalters
Copy link
Member

Pretty interesting stuff, thanks a lot for working on this! The code looks sane at a very quick glance, but I'm going to need a little bit of time to think about it. In particular I want to try to figure out if this applies to what we have today in rpm-ostree ex livefs and also my idea from #1224

But it'd be great if we can pin down the higher level use case again - do you see this as just a development thing or do you see it as potentially useful for production hotfixes?

In the rpm-ostree livefs issue one concrete use case I had was being able to hotfix OpenSSH, which is a classic example of a service that is pretty safe to hotfix.

@wmanley
Copy link
Member Author

wmanley commented Jan 23, 2018

But it'd be great if we can pin down the higher level use case again - do you see this as just a development thing or do you see it as potentially useful for production hotfixes?

This is just intended as supporting work for #1419. The motivation for that is faster deploys with large numbers of files on a slow filesystem. Currently it takes 1m40s to do a deploy and a reboot takes about 10s. This is too slow - we want to be able to deploy a new version of software to our devices fast enough that our customers don't notice any outage. In particular we want to add some more containers which increases the number of files. This will make it even slower.

We can't do a deploy while our customers are using the systems because doing the checkout monopolises the disk. As a side effect it makes development faster too. I'll expand on how our build/deploy system works a little later, when I have time...

Note: I'm not live-updating the current deployment with this command. This isn't a use case I'd considered, but I can see how it could work out well. I'll expand on this on #1419.

I've made this a separate pull-request because I think it's easier to review that way and easier to test in isolation than bundled with #1419.

@wmanley
Copy link
Member Author

wmanley commented Jan 28, 2018

I definitely prefer your approach of recording the tree SHAs in xattrs as suggested in #1224. It's simpler and less error-prone to passing the xattrs in to the --update.

In #1419 I introduced the concept of the deploy cache which stored old deploys named by the commit id. The commit id would be retrieved from the directory name when doing a new deploy. This would be passed to this --update mode when doing a deploy.

This xattrs approach is better for a few reasons:

  1. The way the csums are stored is not specific to ostree admin deploy, they can also be set when doing a normal checkout.
  2. I don't have to special-case /etc when updating deploys. In WIP: deploy: Attempt accelerated deploy by updating old deploy #1419 I was working around the fact that the csum in of the deploy wouldn't match the real checked-out data because /etc had been merged. Because these trees would be csumed individually that's no longer an issue - we just unset the root csum xattr before doing etc merge.
  3. Updating a live deploy in place becomes possible because we don't need to rename the deployment just because the sha of the contents have changed
  4. Updating a live deploy becomes possible because we can cope with non-atomically updated trees - a directory that temporarily doesn't correspond to a particular SHA doesn't have that SHA set on its xattrs. Theres no risk of a partial update being undetected (assuming the appropriate calls to sync).

I'm not saying that live updates would be a good idea, but they'd certainly be possible under this scheme.

I'll have a think about what it would take to implement that.

@cgwalters
Copy link
Member

As a general request; can you file an issue first for bigger features so we can sketch out the design/use cases a bit before we do the code? Getting big patches is obviously great, not going to complain 😉!

So...are you thinking it makes sense to do #1224 first? I can probably take a crack at that sometime soon myself.

@wmanley
Copy link
Member Author

wmanley commented Feb 7, 2018

As a general request; can you file an issue first for bigger features so we can sketch out the design/use cases a bit before we do the code? Getting big patches is obviously great, not going to complain 😉!

Yeah, I agree that this is ideally how it would work. In this case time pressures have meant that we don't always have time for this. I spent probably 2-3 days on this. We're already shipping this and #1419 on our product. It was a prerequisite for other important customer-facing work (https://stb-tester.com/blog/2018/02/06/stb-tester-v28-released)

As such I'm not precious about changing it to get it merged properly so I don't think we've lost much with this approach in this instance. I won't be upset if you ask for changes or flat out say no :).

The other element is that I didn't want to propose it until I knew I could do it. I had a stab at implementing this over a year ago - but failed. Your commit e6f17b9 made it much easier for me to understand the code, because it removed a layer of abstraction. So when I had another go and succeeded I raised this PR. I wouldn't have felt good about coming along and starting a discussion about a new feature that I wanted if I felt like I'd be "demanding" that others implement it for me,

So...are you thinking it makes sense to do #1224 first? I can probably take a crack at that sometime soon myself.

I think the only thing that would change in this PR if #1224 were implemented is where the initial tree sha would come from. So changes here would depend only on a small subset of #1224 - specifically recording the SHAs in the xattrs.

How do you see this working? The two questions in my mind are:

  1. When does the caller make the guarantee that the directories won't be modified? For the xattr to actually match the contents of the directory the contents can't change between when the xattr is set to when it is overwritten. I see a few options:

    1. Have an option to the intiial ostree checkout like --record-dirtree-xattrs. This is a promise to ostree that the caller won't modify the checked-out tree without clearing the xattrs. And then when you come to do an ostree checkout --update ostree just trusts that the xattrs are correct.
    2. Always record dirtree xattrs when checking out and have an option like --trust-dirtree-xattrs when updating a checkout. This flag would signify that the user hasn't modified the checkout since the initial checkout.
    3. Implement both flags
    4. Have a single flag that says both that the xattrs are trustworthy and that xattrs should be recorded. This could be implied by ostree checkout --update
    5. There's no need for an additional flag - ostree checkout --update implies this mode.
  2. How to handle sync/fsync? When updating you'd have to clear the xattr, then update the directory contents, then set the new xattr. If there's no fsync then a power loss could lead to inconsistency between the xattrs and the contents. At the same time we don't want to add a whole bunch of fsync calls as it will be too slow (1810de2). I would propose:

    1. Recurse through removing the xattrs,
    2. syncfs
    3. Update the contents of the directories
    4. syncfs
    5. Recurse through adding the new xattrs.

    The increased complexity is probably not unmanageable, but isn't ideal either. An alternative is to not worry about it and push the complexity to a higher level. I've already had to handle this case in WIP: deploy: Attempt accelerated deploy by updating old deploy #1419 (See b68ecc2) so maybe it's not that big a deal - but having reliable xattrs would allow updates in place. Maybe these additional guarantees (and the associated complexity) can be pushed into the future.

Thoughts?

@wmanley
Copy link
Member Author

wmanley commented Feb 12, 2018

I'm going away on holiday so I'm not going to be able to look at this until March. I'd still be interested in your thoughts in the mean time, I'm just letting you know that I won't be doing anything about them for a little while.

@cgwalters
Copy link
Member

We can't do a deploy while our customers are using the systems because doing the checkout monopolises the disk. As a side effect it makes development faster too.

I get the development case for sure. But I want to understand better how you envision using this for production. I reread through the thread and your patches, but some important details aren't yet clear to me.

Let's say we have two deployments, current and rollback (the ostree default). Normally when we do an upgrade, ostree will end up GCing the rollback but only after an upgrade has been completed - in the bootloader order we always have 2, but on disk we will transiently have 3 checkouts. However, we clearly could, on upgrade, immediately "take ownership" of the rollback. Is something like that what you're thinking? I see the "deploy cache" in later patches.

The downside of this is that the upgrade is less "transactional" in that the rollback will vanish if we're interrupted. Today we don't have ostree rollback but it's trivial to implement (as it is in rpm-ostree); we could add an API for it, and have that API "resurrect" the rollback. I'm not too concerned about this though; in practice I feel like the rpm-ostree deploy <version> approach is nicer (another thing to drive into ostree CLI).

Am I right that your approach is about taking control over the rollback?

@papr-bot
Copy link

Can one of the admins verify this patch?

@cgwalters
Copy link
Member

Some other notes I have:

The xattr diffing is an enhancement but I don't see a need to require it; a valid concern you have is eMMC wear, and (AIUI) the extra reads we do as part of a full diff traverse aren't a problem for that

We should be able to rebase and land some of the preparatory patches now.

I don't have to special-case /etc when updating deploys.

This is definitely a concern - how we handle the ostree 3-way merge. Are you using systemd today? We need to think about how this feature intersects with staged deployments as I'd really like to make that the default in the future.

What filesystem are you using? Does it have reflinks? That helps a lot for /etc. Although presumably you can also minimize /etc for your use case.

@wmanley
Copy link
Member Author

wmanley commented Jun 22, 2018

Let's say we have two deployments, current and rollback (the ostree default). Normally when we do an upgrade, ostree will end up GCing the rollback but only after an upgrade has been completed

Yes, so in #1419 I've modified the GC code to move unused deploys to the cache rather than deleting it:

https://github.com/stb-tester/ostree/blob/09a1e345573caf9c30e70bbf36674f81f88fced6/src/libostree/ostree-sysroot-cleanup.c#L443-L444

This is a lot faster than recursively deleting the deploy as it just involves a rename and we can reuse the deploy later without interfering with rollbacks, etc.

Am I right that your approach is about taking control over the rollback?

No, we take control over the last GCed deploy.

wmanley added 7 commits July 8, 2020 16:45
This is in preparation for adding `OSTREE_REPO_CHECKOUT_OVERWRITE_UPDATE`
where we will need to iterate through two directories simultaneously.
This is in preparation for adding `OSTREE_REPO_CHECKOUT_OVERWRITE_UPDATE`
where we will need to iterate through two directories simultaneously.
Used for updating an already checked out tree to a new version.  This is
intended to make deploys much faster in the future.  In default operation
`checkout_tree_at` now is doing an update from an empty tree to the given
one.

This implements diffs for directories.  The next commit will introduce the
same for files.
The goal is to do ostree deploys faster by updating already checked out
trees.

This commit continues the work in the previous commit. `checkout_tree_at`
now takes an `orig_dirtree_checksum` telling it to assume that the
directory it's checking out into already has this content.
The goal is to do ostree deploys faster by updating already checked out
trees.

This adds an overwrite mode and a new option to specify the original tree.
It will do a lot less work when you're updating a tree with a large number
of files but only a few files have changed.  So if you have a full distro
tree checked out and want to update `/usr/bin/ostree` we no longer have
to recurse through every file in `/usr/share/doc` to do the update.
This exposes the OSTREE_REPO_CHECKOUT_OVERWRITE_UPDATE functionality added
in the previous commit on the command line.
@wmanley wmanley force-pushed the checkout-update branch from 6f2a0ef to ce75735 Compare July 9, 2020 16:24
@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: wmanley
To complete the pull request process, please assign cgwalters
You can assign the PR to them by writing /assign @cgwalters in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@wmanley
Copy link
Member Author

wmanley commented Jul 9, 2020

Rebased on master

@cgwalters
Copy link
Member

Can you split off some of the prep patches here to a separate PR?

@openshift-ci-robot
Copy link
Collaborator

@wmanley: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link

openshift-ci bot commented Jun 29, 2023

@wmanley: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/sanity ce75735 link true /test sanity
ci/prow/images ce75735 link true /test images
ci/prow/fcos-e2e ce75735 link true /test fcos-e2e

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants