sql-server: named lock (GET_LOCK) held by a crashed client is retained indefinitely — idle dead session never reaped

## Summary

When a client holding a named lock (`GET_LOCK`) dies without closing its connection (e.g. `kill -9`, crash, OOM-kill), `dolt sql-server` retains the dead client's session — and its named lock — indefinitely (observed 5+ minutes with no release in a minimal repro; 10+ minutes in the production incident that led us here). Any other connection's `GET_LOCK` on the same name times out for the duration. The only remediation we found is a manual server-side `KILL <conn-id>`.

## Version / platform

`dolt version 2.1.2` (linux/amd64), server started as `dolt sql-server --host 127.0.0.1 --port <port>`. (We have not yet tested newer releases; apologies if this is already addressed.)

## Repro (minimal)

```bash
dolt sql -q "CREATE DATABASE testdb"
dolt sql-server --host 127.0.0.1 --port 13310 &

# Client A: acquire a named lock, then go idle holding the session open
( echo "SELECT GET_LOCK('repro_lock',5);"; sleep 600 ) | mysql -h 127.0.0.1 -P 13310 -u root testdb &

# Verify held:
mysql -h 127.0.0.1 -P 13310 -u root -N -e "SELECT IS_USED_LOCK('repro_lock')"   # -> 2 (client A's conn id)

# Kill client A hard (dead peer, no TCP FIN from the client process):
kill -9 <mysql client pid>

# From a live connection:
mysql -h 127.0.0.1 -P 13310 -u root -N -e "SELECT GET_LOCK('repro_lock',5)"     # -> 0 (timeout, blocked)
mysql -h 127.0.0.1 -P 13310 -u root -N -e "SELECT IS_USED_LOCK('repro_lock')"   # -> 2, for 5+ minutes after the kill
```

Observed timeline: kill at T+0; `IS_USED_LOCK` still reports the dead conn at T+5m10s (end of observation window). `KILL 2` releases it immediately.

## Why it matters

The session is idle (no in-flight query), so nothing prompts the server to read from the dead socket; with no TCP keepalive / dead-peer reaping at the session layer, the named lock outlives its owner until an operator intervenes. Clients that serialize on named locks (schema-migration mutexes, leader election, etc.) turn one crashed client into an indefinite fleet-wide stall. We hit this in production behind [a client-side migration mutex](https://github.com/gastownhall/beads/issues/4368): one dead client process held `GET_LOCK('bd_schema_init:<db>')` for 10+ minutes, starving every other client's store-open until a manual `KILL`.

## Expected

One (or more) of:
- Enable TCP keepalive on client connections so dead peers are detected and their sessions reaped within a bounded interval (releasing session-scoped locks);
- A configurable idle-session / dead-peer timeout at the server session layer;
- At minimum, documentation that named locks can outlive crashed clients indefinitely, with `KILL <conn>` as the remediation.

Happy to re-run the repro against a newer build or with additional instrumentation if useful.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sql-server: named lock (GET_LOCK) held by a crashed client is retained indefinitely — idle dead session never reaped #11194

Summary

Version / platform

Repro (minimal)

Why it matters

Expected

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

sql-server: named lock (GET_LOCK) held by a crashed client is retained indefinitely — idle dead session never reaped #11194

Description

Summary

Version / platform

Repro (minimal)

Why it matters

Expected

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions