`dolt sql-server` leaks one `ssh` child process per `CALL dolt_fetch` against an `ssh://` remote

Caveat: The following report was generated with heavy AI assistance and has only been lightly reviewed before posting. Nevertheless, the underlying assumptions were tested fairly thoroughly so hopefully it's correct.

## Summary

Every `CALL dolt_fetch('<remote>', '<branch>')` against an SSH remote leaves one `ssh` subprocess alive, parented to the `dolt sql-server` process.  The child stays sleeping in `poll()` because the server never closes the pipes connected to the child's stdin/stdout.  Over many fetches these accumulate until the remote sshd / container runs out of PIDs or SSH sessions.

The leak is deterministic (1 child per fetch), occurs on the server side (not the client-side `dolt sql -q` that invokes the CALL), and reproduces against a plain localhost OpenSSH — it is not specific to any particular SSH endpoint, proxy, or ControlMaster configuration.

## Environment

- Dolt versions reproduced:
  - `1.84.0` (released binary)
  - `1.86.2` (freshly built from `main` @ `dfecd55771`, Go 1.25.6)
- OS: Ubuntu 25.10, kernel 6.17.0-14-generic, x86_64
- OpenSSH: distro default (9.x)
- Remote URL shape: `ssh://user@host:port/path/to/db/.dolt`
- Reproduced against:
  1. Remote dolt on a Railway container (via Railway TCP proxy + SSH)
  2. Plain local dolt DB on same host via `ssh://ubuntu@localhost:22/...`

## Reproduction

Minimal repro (localhost only, no external infra required):

```bash
# 1. Build / use a dolt binary.  Repro confirmed on 1.84.0 and 1.86.2.
DOLT=/path/to/dolt

# 2. Create a throwaway "remote" dolt DB.
mkdir -p /tmp/dolt-leak-test && cd /tmp/dolt-leak-test
$DOLT init --initial-branch main
$DOLT sql -q "CREATE TABLE t (id INT PRIMARY KEY);
              INSERT INTO t VALUES (1),(2),(3);"
$DOLT add . && $DOLT commit -m init

# 3. Create a scratch local DB and start a sql-server for it.
mkdir -p /tmp/dolt-leak-local && cd /tmp/dolt-leak-local
$DOLT init --initial-branch main
$DOLT sql-server --host 127.0.0.1 --port 13999 --loglevel warning &
SERVER_PID=$!
sleep 3

# 4. Configure an ssh:// remote pointing at the throwaway DB and
#    fetch from it repeatedly.  Count leaked ssh children parented
#    to the sql-server.
DB=\`dolt-leak-local\`
$DOLT --host 127.0.0.1 --port 13999 --no-tls sql -q \
  "USE $DB;
   CALL dolt_remote('add', 'r',
     'ssh://$USER@localhost:22/tmp/dolt-leak-test/.dolt')"

for i in 1 2 3; do
  $DOLT --host 127.0.0.1 --port 13999 --no-tls sql -q \
    "USE $DB; CALL dolt_fetch('r', 'main')" >/dev/null
  sleep 1
  echo "after fetch $i: $(pgrep -P $SERVER_PID -af \
    'ssh.*dolt.*transfer' | wc -l) leaked ssh children"
done
```

Observed output: count increments by exactly 1 per fetch.

```
after fetch 1: 1 leaked ssh children
after fetch 2: 2 leaked ssh children
after fetch 3: 3 leaked ssh children
```

Same behaviour with
`DOLT_SSH_COMMAND="ssh -o ControlMaster=no -o ControlPath=none -o ControlPersist=no"`
— ControlMaster multiplexing is not involved.

## Diagnostic findings

Each leaked child process (verified via `/proc`):

```
State: S (sleeping)
wchan: poll_schedule_timeout
PPid:  <dolt sql-server PID>
```

Open FDs on the leaked child:

```
fd 0 -> pipe:[A]       # stdin  (read end)
fd 1 -> pipe:[B]       # stdout (write end)
fd 2 -> pipe:[C]       # stderr (write end)
fd 3 -> socket:[...]   # TCP to sshd
```

Cross-checking the pipe peer FDs via `/proc/*/fd/*`:

- Pipe A write end is held by the `dolt sql-server` process.
- Pipe B read end is held by the `dolt sql-server` process.

i.e. **the sql-server still holds both ends of its side of the pipe pair open after `dolt_fetch` has already returned success to the client.**  The `ssh` child therefore blocks forever in `poll()` — stdin never sees EOF, no data arrives, no incentive to exit.

Remote side: each orphan `ssh` client keeps a channel open to the remote sshd, which keeps a forked `dolt ... transfer` subprocess alive on the remote host.  On a container with a tight PID cgroup (e.g. a Railway service), this rapidly exhausts the container's PID budget and every subsequent process spawn fails with:

```
runtime/cgo: pthread_create failed: Resource temporarily unavailable
SIGABRT
```

on the next invocation of any dolt command inside the container (including `dolt version`).  That abort is a correct response from the Go runtime to PID exhaustion — but the root cause is this leak.

The crash trace from the remote in that state includes:

```
github.com/dolthub/dolt/go/store/nbs.(*tableSet).rebase
  go/store/nbs/table_set.go:576
github.com/dolthub/dolt/go/store/nbs.newNomsBlockStore
  go/store/nbs/store.go:845
github.com/dolthub/dolt/go/store/nbs.newLocalStore
  go/store/nbs/store.go:763
```

## Expected behaviour

After `CALL dolt_fetch` completes, the server should close its ends of the pipes to each spawned `ssh` child, causing the child to see stdin EOF and exit cleanly.  No process should remain parented to the sql-server after the CALL returns.

## Likely location

The SSH remote / chunk-transfer driver inside `dolt sql-server`.  The `os/exec.Cmd` (or equivalent) spawning `$DOLT_SSH_COMMAND <args> dolt ... transfer` appears not to call `Wait()`, and/or not to close its stdin/stdout pipes after the transfer completes on the session's control path.

## Workarounds

None that are in-process.  Callers can mitigate by periodically killing leaked `ssh ... dolt ... transfer` children parented to the sql-server, or by restarting the sql-server between sync batches.  Neither is a real fix.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`dolt sql-server` leaks one `ssh` child process per `CALL dolt_fetch` against an `ssh://` remote #10897

Summary

Environment

Reproduction

Diagnostic findings

Expected behaviour

Likely location

Workarounds

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

dolt sql-server leaks one ssh child process per CALL dolt_fetch against an ssh:// remote #10897

Description

Summary

Environment

Reproduction

Diagnostic findings

Expected behaviour

Likely location

Workarounds

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`dolt sql-server` leaks one `ssh` child process per `CALL dolt_fetch` against an `ssh://` remote #10897