Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a goal to introspect and garbage collect caches. #11167

Open
1 of 4 tasks
stuhood opened this issue Nov 12, 2020 · 9 comments
Open
1 of 4 tasks

Add a goal to introspect and garbage collect caches. #11167

stuhood opened this issue Nov 12, 2020 · 9 comments

Comments

@stuhood
Copy link
Member

stuhood commented Nov 12, 2020

This might look like:

  • Adding a goal that reports the disk usage in all cache locations (the LMDB store and named caches in particular)
  • Exposing manually running GC for the cache locations (in alignment with the GC that pantsd does) and/or completely clearing them.

Because this will need to access APIs which should not be available to @rules, this should likely be a "builtin" goal: ./pants gc (similar to ./pants help).


A suggested series of four-ish PRs for this one.

  1. HelpRequest becomes NativeGoalRequest: Shifted to BuiltinGoal from HelpRequest #12297
  2. add a new native goal request for gc/garbage/storage/etc (depending on the default of the --check flag)
    • adjust LocalPantsRunner to handle the new request type by delegating to a "GarbageCollector" (or something) helper class
    • add support for a non-default --check mode which reports the disk usage in the two storage locations:
      1. named_caches_dir
      2. local_store_dir
  3. add a default --no-check mode, which will collect garbage in both locations
    • for named_caches_dir (probably? cc @jsirois) delete the oldest top-level directory that gets you below a hardcoded target
    • for local_store_dir, call garbage_collect_store
  4. add global options to set the target sizes of the local_store_dir and named_caches_dir
    • NB: We have "maximum" size settings (e.g.) for the local_store_dir, but the "target" size (i.e., the size that garbage collection should attempt to collect down to) is hardcoded.
@stuhood

This comment has been minimized.

@stuhood
Copy link
Member Author

stuhood commented Jan 20, 2021

I've added some information about clearing caches to https://www.pantsbuild.org/v2.3/docs/troubleshooting#cache-or-pantsd-invalidation-issues.

@illicitonion
Copy link
Contributor

Following on from https://pantsbuild.slack.com/archives/C0105PY6BM5/p1612892422002100

It could be handy to have some goal (or standalone utility, or flag, or something), to perform a garbage collection of the lmdb store, but rather than being time-based, using local action cache entries as GC roots, and only keeping digests referenced by these roots. Allows us to prune "we made a copy of every input file" while preserving "we can avoid doing expensive work". Ideally we could order the action cache entries by time, too, so that there's some LRU cut-off.

@stuhood
Copy link
Member Author

stuhood commented Feb 9, 2021

Great idea.

I'm going to go ahead and drop the pantsd aspect from this ticket, because we haven't gotten any feedback elucidating what that might look like.

@stuhood stuhood changed the title Add goals to introspect pantsd and cache status. Add a goal to introspect and garbage collect caches. Feb 9, 2021
@benjyw benjyw removed the Q42020-idea label Sep 9, 2021
@thejcannon
Copy link
Member

Re: the command name. I think garbage collection (or similar) might be a bit misleading. To me, garbage collection is simply cleaning things that are stale. In that regard I would imagine this command would clear cache entries that wouldn't be used anywhere for the current state of the monorepo (btw, that should be a feature though 😉 )

Since you're thinking of doing multiple interesting things with the caches, perhaps this would become caches with multiple sub-commands?

  • ./ptants caches report
  • ./pants caches gc
  • ./pants caches clean

@stuhood
Copy link
Member Author

stuhood commented Nov 8, 2021

Re: the command name. I think garbage collection (or similar) might be a bit misleading. To me, garbage collection is simply cleaning things that are stale. In that regard I would imagine this command would clear cache entries that wouldn't be used anywhere for the current state of the monorepo (btw, that should be a feature though 😉 )

That is actually what is happening when the store is garbage collected:

self._scheduler_session.garbage_collect_store(self._target_size_bytes)
... items are cleared based on LRU.

@stuhood
Copy link
Member Author

stuhood commented Feb 3, 2022

#13991 made further improvements to BuiltinGoal, so the actual body of this change is now unblocked.

@stuhood
Copy link
Member Author

stuhood commented Jun 7, 2022

Relatedly: it's probably time for named caches to evolve actual infrastructure backed by a @union, so that Pants is able to introspect them and get details about what is safe to clean. That way a background cleanup job or foreground goal could avoid using heuristics around what to delete.

For example: a reified named cache for PEX could incorporate the advice from #14364 and more aggressively cleanup venv directories.

cburroughs added a commit to cburroughs/pants that referenced this issue Sep 17, 2024
Highlighted Changelogs:
 * https://github.com/pex-tool/pex/releases/tag/v2.17.0
 * https://github.com/pex-tool/pex/releases/tag/v2.18.0
 * https://github.com/pex-tool/pex/releases/tag/v2.19.0

```
Lockfile diff: 3rdparty/python/user_reqs.lock [python-default]
==                    Upgraded dependencies                     ==

  idna                           3.8          -->   3.10
  pex                            2.16.2       -->   2.19.0
```

ref pantsbuild#18294 pantsbuild#11167
cburroughs added a commit to cburroughs/pants that referenced this issue Sep 18, 2024
Highlighted Changelogs:
 * https://github.com/pex-tool/pex/releases/tag/v2.17.0
 * https://github.com/pex-tool/pex/releases/tag/v2.18.0
 * https://github.com/pex-tool/pex/releases/tag/v2.19.0
 * https://github.com/pex-tool/pex/releases/tag/v2.19.1

```
Lockfile diff: 3rdparty/python/user_reqs.lock [python-default]
==                    Upgraded dependencies                     ==

  idna                           3.8          -->   3.10
  pex                            2.16.2       -->   2.19.1
```

ref pantsbuild#18294 pantsbuild#11167
cburroughs added a commit that referenced this issue Sep 19, 2024
Highlighted Changelogs:
 * https://github.com/pex-tool/pex/releases/tag/v2.17.0
 * https://github.com/pex-tool/pex/releases/tag/v2.18.0
 * https://github.com/pex-tool/pex/releases/tag/v2.19.0
 * https://github.com/pex-tool/pex/releases/tag/v2.19.1

```
Lockfile diff: 3rdparty/python/user_reqs.lock [python-default]
==                    Upgraded dependencies                     ==

  idna                           3.8          -->   3.10
  pex                            2.16.2       -->   2.19.1
```

ref #18294 #11167
@huonw
Copy link
Contributor

huonw commented Nov 4, 2024

For pex named-cache management specifically, see pex-tool/pex#2586 (comment) about the new pex3 cache prune ... subcommand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants