Skip to content

Conversation

@chiranjib-swain
Copy link
Contributor

Description:
This pull request updates the labeler function to improve how labels are managed and applied to pull requests, especially when labels may have been added manually or by other bots during the workflow run.

Label management improvements:

  • The label application logic now fetches the latest labels from the pull request before applying new ones, so it can detect labels that were manually added or added by other bots during the workflow run. It merges these with the labels determined by the config, deduplicates them, and enforces the maximum label limit (GITHUB_MAX_LABELS).
  • The output for all-labels now reflects the final set of labels actually applied to the pull request, ensuring outputs are accurate and up-to-date..

Related issue:
(#908).

Check list:

  • Mark if documentation changes are required.
  • Mark if tests were added or updated to cover the changes.

Copilot AI review requested due to automatic review settings November 24, 2025 05:14
@chiranjib-swain chiranjib-swain requested a review from a team as a code owner November 24, 2025 05:14
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the labeler action to preserve labels that are manually added or added by other bots during the workflow execution, addressing issue #908. The key improvement is fetching the latest PR labels before applying changes, merging them with config-based labels, and ensuring accurate outputs.

  • Fetches latest PR labels to detect manual additions during workflow execution
  • Merges manually added labels with config-based labels while respecting the 100-label limit
  • Updates the 'all-labels' output to reflect the final applied labels

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 7 comments.

File Description
src/labeler.ts Implements label fetching and merging logic to preserve manually added labels, updates output to use final applied labels
dist/index.js Compiled/bundled version of the TypeScript changes with identical logic updates

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@jnewland jnewland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#908 describes a lost update where a user-added label is removed because a setLabels call is made without knowledge of the user-added label. My read of this PR is that it reduces the window of time during which updates can be lost but it does not in any way prevent those lost updates from happening.

I don't believe that ANY approach using setLabels can fix the lost-update problem introduced by #497: setLabels replaces the entire label set in a single write operation, discarding any label changes made since the last read. If concurrent changes are made between the read and write steps - by a user or another workflow - those updates will be overwritten and lost.

I believe the only path towards removing the ability of this action to clobber user writes is to rework it to use addLabels and removeLabels as it did before #497.

Comment on lines +66 to +83
const pr = await client.rest.pulls.get({
...github.context.repo,
pull_number: pullRequest.number
});
latestLabels.push(...pr.data.labels.map(l => l.name).filter(Boolean));
}

// Labels added manually during the run (not in first snapshot)
const manualAddedDuringRun = latestLabels.filter(
l => !preexistingLabels.includes(l)
);

// Preserve manual labels first, then apply config-based labels, respecting GitHub's 100-label limit
finalLabels = [
...new Set([...manualAddedDuringRun, ...labelsToApply])
].slice(0, GITHUB_MAX_LABELS);

await api.setLabels(client, pullRequest.number, finalLabels);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if a label were added by a human while a GC pause happened at line 74? My read of this code is that it'd be lost just like the original issue describes.

@chiranjib-swain
Copy link
Contributor Author

chiranjib-swain commented Dec 16, 2025

Hi @jnewland ,
Thank you for sharing your thoughts on the label management strategy.

We understand the familiarity with the earlier POST + DELETE approach. After revisiting the long-standing reliability challenges documented in PR #497, we’ve chosen to continue with the setLabels (PUT-based bulk update) method because it offers a more stable and predictable foundation for large or high-activity repositories.

Why we rely on setLabels (PUT)

The previous incremental approach worked for smaller workflows, but at scale it introduced several issues:

  1. High API usage: DELETE removes one label at a time, often triggering rate-limit exhaustion.
  2. Inconsistent retries: The non-atomic POST failed with a 502 error after applying a partial set of over 50 labels. Subsequent retries failed with a 422 error ("more than 100 labels") as the new labels combined with the partially applied ones exceeded the limit.
  3. Unreliable under load: A high volume of PR labels (50+) or concurrent workflow runs increased 5xx errors and incomplete updates due to the nature of POST/DELETE operations.

These problems were significant enough that PR #497 replaced POST/DELETE with a single PUT-based update.

Why PUT is more reliable

  1. Safe to retry (idempotent)
  2. Updates all labels in one atomic call
  3. Enforces GitHub’s 100-label limit before sending the request.
  4. Replaces dozens of DELETE calls with a single operation

This aligns better with the concurrency patterns common in GitHub Actions.

How PR #917 improves this further

PR 917 narrows the read–write race window, preserves manually added labels more consistently, and improves workflow stability when updates happen mid-run.

We truly appreciate your feedback and are open to further discussion on improving label synchronization within the constraints of GitHub’s API.

@jnewland
Copy link

Thanks for the update. I had forgotten that the label delete API only allows one label to be removed at a time; that's disappointing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants