Skip to content

RFC: Recurring Policy Cleanup #1075

@mattmoor

Description

@mattmoor

As organizations adopt Octo STS, automation comes and goes, and it is easy to forget to clean up a trust policy, especially if the automation is managed by IaC, but cleaning up the policy requires a follow-up pull request, it is easy to forget to clean that policy up.

Currently we track (as is plainly visible in iac/) 90d of exchange events in BigQuery, which has helped us identify when a particular identity is exhausting an organization's shared quota.

With this data, we can see what policies (identity) have been active at what scope for each organization (installation_id):

SELECT o.installation_id, o.scope, o.identity, COUNT(DISTINCT o.actor.sub) AS actors, COUNT(*) AS c
FROM `octo-sts.cloudevents_octo_sts_recorder.dev_octo-sts_exchange` AS o
WHERE o.error IS NULL
GROUP BY o.installation_id, o.scope, o.identity
ORDER BY c DESC

... but we "don't know what we don't know". We cannot see beyond those 90d to know what policies were used and dropped off. Moreover, if a policy was created, but NEVER used, then the logs are an insufficient mechanism for identifying candidates for policy cleanup.

Proposal

My proposal is as follows...

Allow organizations to opt-in to a monthly policy cleanup cron by creating a policy in the {org}/.github repository under .github/chainguard/octo-sts-monthly-cleanup.sts.yaml (happy to :bikeshed: here, the name isn't of consequence), with something like:

issuer: https://accounts.google.com
# The unique ID of the service account that octo-sts-monthly-cleanup runs as (auditable in the public actions logs for deploying octo-sts.dev)
subject: "1234567"

permissions:
  # Required to push a branch to the repository containing defunct policies
  contents: write
  # Required to turn the branch ☝️ into a pull request.
  pull_requests: write

# Clean up policies across the entire organization.
repositories: []

The rough flow for folks opted-in will look something like this:

  1. Monthly, octo-sts will enumerate all installations,
  2. For each installation, it will attempt to assume the opt-in identity -- 🚨 if folks don't have a VALID identity, it stops here
  3. Using the assumed identity, search for org:{org} path:/^.github\/chainguard\/.*.sts.yaml/ to identify all candidate repos and policies,
  4. Walk through the full repo and file list eliminating any policy used in the last 90d,
  5. For each repository with unused policies, open a pull request to remove all of the unused policies.

We should have 0-1 of these pull requests per repository, so I believe that we should adopt a similar convention to dependabot and use a branch named octo-sts/{identity} so for our strawperson policy name we would use the branch: octo-sts/octo-sts-monthly-cleanup.

Each month the cleanup will force push to this branch, and open a PR if there is not one. So if a month goes by and the PR isn't merged, it will update in-place.


I have not yet prototyped any of this, so there may be some nuanced divergences as its implemented (e.g. IDK whether contents: read is sufficient permissions to search 🤷‍♂️ ), but I believe that this rough outline should allow us to achieve the goal, while being OPT IN, and minimizing the impact on the organization's app quotas.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions