Skip to content

Conversation

jaimergp
Copy link
Contributor

@jaimergp jaimergp commented Mar 10, 2025

Checklist for submitter

  • I am submitting a new CEP: Build provenance metadata.
    • I am using the CEP template by creating a copy cep-0000.md named cep-XXXX.md in the root level.
  • I am submitting modifications to CEP XX.
  • Something else: (add your description here).

Checklist for CEP approvals

  • The vote period has ended and the vote has passed the necessary quorum and approval thresholds.
  • A new CEP number has been minted. Usually, this is ${greatest-number-in-main} + 1.
  • The cep-XXXX.md file has been renamed accordingly.
  • The # CEP XXXX - header has been edited accordingly.
  • The CEP status in the table has been changed to approved.
  • The last modification date in the table has been updated accordingly.
  • The pre-commit checks are passing.

Copy link

@JeanChristopheMorinPerso JeanChristopheMorinPerso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm excited to see this being formalized!

cep-9999.md Outdated
- `remote_url`: Required on CI. Repository URL of the feedstock being built.
- `flow_run_id`: Optional. CI-specific identifier for the workflow run.

For local workflows such as those specified by CFEP 03, `remote_url` MAY be omitted, but authors strongly recommend providing the adequate value manually if necessary.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For local workflows such as those specified by CFEP 03, `remote_url` MAY be omitted, but authors strongly recommend providing the adequate value manually if necessary.
For local workflows such as those specified by [CFEP-03](https://github.com/conda-forge/cfep/blob/main/cfep-03.md), `remote_url` MAY be omitted, but authors strongly recommend providing the adequate value manually if necessary.

Also, if remote_url is omitted, should sha also be omitted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I had assumed users interested in provenance are already using some type of version control, but maybe we can't force that either.

About the dash in CFEP 03, see

name: CEPs must be referred to with 'CEP N' (no dash)
.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please keep sha as mandatory ... some packages may not have the remote_url to not expose private repositories to the public, but the sha is this helpful for internal audits / attestations to check that e.g. the sha actually exists / was reviewed as part of an PR / triggered a CI run etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am writing this to describe the current conventions, not to impose on how packages should be built. I think that should be decided by packaging organizations separately. I plan to submit a CFEP for conda-forge where we do "require these fields in CI pipelines, as recommended by CEP XYZ".

Someone using conda-build to share an artifact with their research lab internally may not need to care about whether the recipe is version controlled or what a git hash is.

cep-9999.md Outdated

- `sha`: Required. Commit hash of the feedstock being built.
- `remote_url`: Required on CI. Repository URL of the feedstock being built.
- `flow_run_id`: Optional. CI-specific identifier for the workflow run.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name flow_run_id originates from Anaconda's current build system. I wish that we could use something more generic and meaningful. But I guess the boat has sailed already? Or do you foresee a future where CF would be willing to change this to something else (on our side at Anaconda, this is very easy to do).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can change it easily too, but then we will have to maintain two ways of accessing this metadata because the already stamped artifacts won't be rebuilt. I think flow_run_id is sufficiently generic. I always read it as "(work)flow run ID".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flow_run_id is constantly used for defaults and conda-forge and the naming was accepted on both sites in the past ... have a look into this PR ... you can see it that PR that every CI system (Azure, Travis, Github,...) name their variable containing the ID differently ... flow_run_id was/is meant to be agnostic and the value prefix tells what automation/ci/flow/workflow system was used.

@jaimergp jaimergp mentioned this pull request Mar 11, 2025
@chenghlee
Copy link
Contributor

Is this an intermediate step towards generating SLSA provenance attestations? If so, should we just straight in that direction, or would trying to implement SLSA delay the gains we want to get now?

@dbast
Copy link
Member

dbast commented Mar 11, 2025

thanks @jaimergp for the initiative here.

some background:

@chenghlee yes, it was meant as intermediate step to enable (multiple different) actual attestations via an attestation worker using the data to lookup things:

  • process attestation: does the mentioned sha actually exist and was it part of a PR review process?
  • automation attestation: the sha after merge to main triggered a CI build matching the flow_run_id and the sha256 of the package is also present in the build log -> proof that a package is automatically build without manual interventions.
  • provenance attestation: if the previous attestations are successfully it becomes implicitly clear that the provenance data is correct.

though, if and how those attestations can be stored via e.g. Sigstore in context of SLSA is a different thing.

tl;dr enables attestations that would be hard otherwise without any provenance data.

@jaimergp
Copy link
Contributor Author

Is this an intermediate step towards generating SLSA provenance attestations? If so, should we just straight in that direction, or would trying to implement SLSA delay the gains we want to get now?

I'm not personally aiming for that, only wanted to standardise what otherwise was an undocumented convention. I think that SLSA provenance can be iterated on later, and this can just reflect the current state. This way we can refer at non-SLSA provenance like "CEP XYZ metadata".

@jaimergp jaimergp changed the title Add CEP for build provenance metadata CEP XXXX: Build provenance metadata Sep 27, 2025
@jaimergp jaimergp moved this to In Progress in STA conda & conda-forge Sep 27, 2025
@jaimergp jaimergp self-assigned this Sep 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provenance metadata
4 participants