fix(runner): add Redis-based abort polling for multi-replica runners by pfreixes · Pull Request #5703 · NangoHQ/nango

pfreixes · 2026-03-25T11:42:14Z

When runners have replicas > 1, pods sit behind a K8s Service that load-balances requests. The HTTP-based tRPC abort call may hit the wrong pod, leaving the task running. This adds Redis polling for the abort flag (already set by Jobs) when RUNNER_CONFLICT_RESOLUTION_MODE is REDIS, matching the existing lambda-runner pattern.

It also centralizes the KV store instance and exposes flags to control abort polling in the runner server.

This summary was automatically generated by @propel-code-bot

When runners have replicas > 1, pods sit behind a K8s Service that load-balances requests. The HTTP-based tRPC abort call may hit the wrong pod, leaving the task running. This adds Redis polling for the abort flag (already set by Jobs) when RUNNER_CONFLICT_RESOLUTION_MODE is REDIS, matching the existing lambda-runner pattern. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

propel-code-bot

Review found no issues with the Redis-based abort polling changes.

Status: No Issues Found | Risk: Low

Review Details

📁 2 files reviewed | 💬 0 comments

Instruction Files

└── .claude/
    ├── agents/
    │   └── nango-docs-migrator.md
    └── skills

TBonnin · 2026-03-25T12:51:39Z

+                              logger.error('Error checking abort flag', { taskId, error: err });
+                          }
+                      }, abortCheckIntervalMs)
+                    : null;


we are adding a setInterval and redis query for each task every 1000ms. Can it be a problem resource-wise and redis-wise?

From my understanding this is the same interval that we use for lambdas, so we already have a consistent workload coming from there with this interval, the server load seems pretty low.

So, regardless if we start using this in the runners all of this workload will appear eventually from our lambdas (once we do the full migration)

Still this heartbeat is only enabled for runners that have more than one replica.

I was going to check how much extra load enabling would be coming from runners having replica > 1, ill try to gather some data.

propel-code-bot Bot reviewed Mar 25, 2026

View reviewed changes

rossmcewan approved these changes Mar 25, 2026

View reviewed changes

TBonnin reviewed Mar 25, 2026

View reviewed changes

TBonnin approved these changes Mar 25, 2026

View reviewed changes

pfreixes added this pull request to the merge queue Mar 25, 2026

Merged via the queue into master with commit 5098c51 Mar 25, 2026
25 checks passed

pfreixes deleted the worktree-runner-cancellation branch March 25, 2026 14:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(runner): add Redis-based abort polling for multi-replica runners#5703

fix(runner): add Redis-based abort polling for multi-replica runners#5703
pfreixes merged 1 commit into
masterfrom
worktree-runner-cancellation

pfreixes commented Mar 25, 2026 •

edited by propel-code-bot Bot

Loading

Uh oh!

propel-code-bot Bot left a comment

Uh oh!

TBonnin Mar 25, 2026

Uh oh!

pfreixes Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pfreixes commented Mar 25, 2026 • edited by propel-code-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

propel-code-bot Bot left a comment

Choose a reason for hiding this comment

Uh oh!

TBonnin Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

pfreixes Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pfreixes commented Mar 25, 2026 •

edited by propel-code-bot Bot

Loading