Skip to content

neilmovva/img

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

img — fleet-join

Turn (almost) any container into a tailnet node you can SSH into, assuming almost nothing about the host: no sudo, no public IP, no inbound ports, rootless container runtime, arbitrary workload image.

Once a node is up, anyone allowed by your tailnet SSH policy can ssh sr@<node-name> straight into the workload container's filesystem and environment (GPUs, conda/venv, scratch — all there). Auth is tailnet policy; no keys or passwords to manage.

# once per host: drop in an (ephemeral, reusable, tagged) tailscale auth key
mkdir -p ~/.config/fleet-join && (umask 077; echo 'tskey-auth-...' > ~/.config/fleet-join/authkey)

# run any image as a fleet node
./launchers/podman-run.sh docker.io/rocm/pytorch:latest
# -> prints: ssh sr@fleet-mi355x8-a1b2c3

How it works

The published image ghcr.io/neilmovva/img is a ~60MB carrier, not a base image: static tailscale/tailscaled binaries plus bootstrap.sh. It gets materialized into a directory and mounted into any workload container, whose entrypoint becomes /fleet/bootstrap.sh -- <your command>. bootstrap.sh:

  • starts tailscaled --tun=userspace-networking — no NET_ADMIN/tun needed; DERP relays mean outbound-443-only hosts still work
  • enables Tailscale SSH (with the critical --statedir fix — see NOTES.md)
  • names the node <prefix>-<gpulabel>-<random>, hardware auto-detected (AMD via sysfs even without ROCm userland; NVIDIA via nvidia-smi), or stable via TS_HOSTNAME
  • creates a locked login user sr and snapshots the container's full ENV into SSH sessions so python/torch just work
  • runs as ephemeral by default: killed containers vanish from the tailnet

Why not a tailscale sidecar? SSH sessions would land in the sidecar's mount namespace — a barren image, not your workload's. In-workload tailscaled is the point.

Layout

bundle/            the fleet-join carrier image
  Containerfile      static tailscale + tailscaled binaries (multi-arch)
  bootstrap.sh       the entrypoint wrapper that joins the tailnet
launchers/
  podman-run.sh      bare host (rootless podman or docker)
  k8s.yaml           kubernetes initContainer pattern (designed; validate on your cluster)
NOTES.md           hard-won debugging knowledge — read before touching Tailscale SSH
.github/workflows/
  publish.yml        builds + pushes ghcr.io/<owner>/img on push to main / tags

Requirements

Workload image: bash + coreutils (true for ~every ML image). runuser/setpriv (util-linux) for the run-as-sr drop; degrades to root with a warning otherwise.

Tailnet:

  • an ephemeral + reusable + pre-approved + tagged auth key (e.g. tag:fleet)
  • an SSH ACL allowing your users → tag:fleet as user sr

Host: ability to run a container as container-root (rootless engine is fine) and outbound 443. That's it.

Building locally

podman build -f bundle/Containerfile -t ghcr.io/neilmovva/img:latest bundle/

CI (.github/workflows/publish.yml) builds multi-arch (amd64 + arm64) and pushes on every push to main and on v* tags.

About

fleet-join: turn any container into a tailnet SSH node (rootless, no inbound ports)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors