Skip to content

Feature/local resource healthchecks#2932

Open
pizzaandcheese wants to merge 2 commits into
fosrl:mainfrom
pizzaandcheese:feature/local-resource-healthchecks
Open

Feature/local resource healthchecks#2932
pizzaandcheese wants to merge 2 commits into
fosrl:mainfrom
pizzaandcheese:feature/local-resource-healthchecks

Conversation

@pizzaandcheese

Copy link
Copy Markdown

Community Contribution License Agreement

By creating this pull request, I grant the project maintainers an unlimited,
perpetual license to use, modify, and redistribute these contributions under any terms they
choose, including both the AGPLv3 and the Fossorial Commercial license terms. I
represent that I have the right to grant this license for all contributed content.

Description

This change adds healthcheck support for local sites in Pangolin.

How to test?

You need a Pangolin instance with at least one Local site configured — a site where the target is directly reachable by the Pangolin server over the network (not tunnelled through Newt).

Test 1: Healthcheck turns healthy

  1. Go to a resource attached to a local site, open the proxy settings, and click the healthcheck button on a target
  2. Configure a simple HTTP check pointing at a running service (e.g. GET / expecting 200)
  3. Enable it and save
  4. Within a few seconds the target should transition from Unknown → Healthy
  5. The resource card should also reflect Healthy

Test 2: Healthcheck detects a failure

  1. With the healthcheck running and showing Healthy, stop or block the target service
  2. Within one polling interval the target should transition to Unhealthy
  3. Restart the service — it should return to Healthy

Test 3: TCP mode

Repeat tests 1 and 2 using TCP mode instead of HTTP to verify the TCP probe path works independently

Test 4: Config options

  • Set a custom interval (e.g. 10s) and verify the polling frequency changes
  • Set an unhealthy interval (e.g. 60s) and verify it polls less aggressively once unhealthy
  • Set a healthy/unhealthy threshold > 1 and verify the target doesn't flip status until that many consecutive results are seen

Test 5: WireGuard sites are unaffected

On a WireGuard site, verify the healthcheck button still does not appear and no probing happens

Test 6: Newt sites are unaffected

On a Newt site, verify healthchecks still work exactly as before — the Newt agent should still be doing the probing, not the server

Test 7: Enable/disable toggle

Disable an active healthcheck and verify the status resets to Unknown and polling stops. Re-enable it and verify it resumes from Unknown → Unhealthy → Healthy

Local sites have no Newt agent that can drive the targetHealthCheck
probes, so health checks were silently disabled for them. The Pangolin
server can reach those targets directly, so we now run the probes
ourselves on a small interval-based scheduler:

- New server/routers/target/localHealthChecker.ts: HTTP and TCP probes
  with per-check interval, timeout, expected status, follow-redirects,
  custom headers, and healthy/unhealthy thresholds. Status changes go
  through the same fireHealthCheck{Healthy,Unhealthy}Alert helpers used
  by the Newt-driven flow, so the UI, alerts, and history are unchanged.
- ws/messageHandlers.ts: start the local poller alongside the existing
  Newt/Olm offline checkers (every build, including saas).
- updateTarget.ts: stop forcing hcHealth='unknown' for local sites; they
  are now treated like newt sites (wireguard sites still pass through
  the unknown branch since nothing probes them).

Re-uses the existing targetHealthCheck schema and TargetHealthCheck
type, so no migration is needed.
The healthcheck configuration button was hardcoded to only show for
newt site types (siteType === 'newt'). Added 'local' to the condition
so that local resources can also have healthchecks configured from
both the resource proxy settings page and the resource creation page.

The server-side local health checker (localHealthChecker.ts) was
already implemented and operational -- it polls local targets via
HTTP/TCP directly from the Pangolin server and updates hcHealth
status in the database. The only missing piece was this UI gate.
@AstralDestiny

Copy link
Copy Markdown
Contributor

So pangolin itself does the healthchecks at that point or you leverage traefik's own built in healthchecks and query that?

@pizzaandcheese

Copy link
Copy Markdown
Author

Currently, the Pangolin process runs the health checks. This is in line with how healthchecks are already done with Gerbil.

This can be expanded to cover Wireguard hosts in the same way.

@AstralDestiny

Copy link
Copy Markdown
Contributor

Actually pangolin doesn't do the healthchecks it's newt that does them right now.

@oschwartz10612

Copy link
Copy Markdown
Member

Hey thank you so much for this PR! I think right now we are going to keep the hc functionality in newt. I appreciate you opening this though!

@keonramses

Copy link
Copy Markdown

Hi @oschwartz10612, thanks for taking a look at this! I completely understand wanting to centralize health check functionality in Newt. However, I’d love for the team to reconsider this approach for local sites. This feature has been highly requested by the community in issues like #1835 and #1873. A major driving factor here is that many users experience performance bottlenecks with Newt (as tracked in #512). For environments where the Pangolin server can reach the target directly, allowing native server-side health checks removes that unnecessary overhead and provides a much more stable experience. Since the PR is already written and limits the scope strictly to local/server-reachable sites without breaking Newt's existing flow, would you be open to revisiting this as an optional/opt-in feature?

@oschwartz10612

Copy link
Copy Markdown
Member

1.19 is locked but willing to take a look for a different release

@keonramses

Copy link
Copy Markdown

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants