Distributed locking with Raft consensus and fencing tokens
Strong consistency for distributed mutual exclusion
Quick Start · Go SDK · Examples · API · Blog post
lowkey is a distributed lock service built on Raft consensus. It provides strongly consistent locks with fencing tokens to prevent split-brain scenarios and stale writes.
lowkey is layered: client SDK on top, gRPC + HTTP gateway in the middle, HashiCorp Raft underneath, and the lock/lease FSM at the bottom. Lease renewals take a fast path on the leader (no Raft round-trip); only lock acquire/release go through consensus.
Use cases:
- Distributed cron jobs (only one instance executes)
- Database migrations (ensure single execution)
- Leader election
- Critical section protection across multiple processes
Core guarantees:
- Strong consistency, CP in CAP, no split-brain under network partitions
- Fencing tokens, monotonically increasing counters that prevent stale writes
- Automatic cleanup, lease-based locks release automatically on client failure
Deep dive: Building lowkey, a distributed lock service in Go. Covers the CP trade-off, why fencing tokens matter, the leader-only heartbeat fast path, and the design choices that shaped the implementation.
The problem with naive locking. Multiple service instances need to coordinate. You add a distributed lock. But:
- Networks partition, so multiple "leaders" appear
- Processes pause (GC, CPU), so stale lock holders exist
- Clients crash, so locks are held forever
How lowkey solves it.
- Raft consensus: only the majority partition can acquire locks
- Fencing tokens: resources reject operations from stale lock holders
- Leases: locks auto-release when clients stop heartbeating
Raft is a majority-vote protocol: a write is committed only when a quorum of nodes agrees. A minority partition cannot reach quorum, so it cannot grant new locks. This is what kills split-brain.
Comparison with alternatives:
| System | Consensus | Fencing Tokens | Split-brain Protection |
|---|---|---|---|
| lowkey | Raft | yes | yes |
| etcd | Raft | yes | yes |
| Consul | Raft | no | yes |
| Redis Redlock | none | no | no |
lowkey is CP: when the cluster splits, only the majority side keeps serving writes. The minority side rejects acquire/release until it rejoins. No two clients can ever hold the same lock with the same fencing token.
git clone https://github.com/pixperk/lowkey.git
cd lowkey
make build # produces ./bin/lowkey
# or: go install ./cmd/lowkey./bin/lowkey --bootstrap --data-dir ./data
# Server running on:
# - gRPC: :9000
# - HTTP: :8080# Create lease (60 second TTL)
curl -X POST http://localhost:8080/v1/lease \
-d '{"owner_id":"client-1","ttl_seconds":60}'
# Response: {"lease_id":1}
# Acquire lock
curl -X POST http://localhost:8080/v1/lock/acquire \
-d '{"lock_name":"my-job","owner_id":"client-1","lease_id":1}'
# Response: {"fencing_token":1}
# Release lock
curl -X POST http://localhost:8080/v1/lock/release \
-d '{"lock_name":"my-job","lease_id":1}'go get github.com/pixperk/lowkey/pkg/clientpackage main
import (
"context"
"log"
"time"
"github.com/pixperk/lowkey/pkg/client"
)
func main() {
c, err := client.NewClient("localhost:9000", "worker-1")
if err != nil {
log.Fatal(err)
}
defer c.Stop()
// Create lease with automatic renewal
ctx := context.Background()
if err := c.Start(ctx, 10*time.Second); err != nil {
log.Fatal(err)
}
// Acquire lock (returns fencing token)
lock, err := c.Acquire(ctx, "my-job")
if err != nil {
log.Printf("Lock held by another instance: %v", err)
return
}
defer lock.Release(ctx)
token := lock.Token()
log.Printf("Acquired lock with fencing token %d", token)
// Use token in protected operations
database.ExecuteWithToken(ctx, token, func() {
// Critical section - only one instance executes this
})
}Client API:
| Method | Description |
|---|---|
NewClient(addr, ownerID) |
Create client connection to lowkey server |
Start(ctx, ttl) |
Create lease with automatic heartbeat (renews every TTL/3) |
Acquire(ctx, lockName) |
Acquire lock, returns *Lock with fencing token |
Release(ctx, lockName) |
Release lock explicitly |
Status(ctx) |
Get cluster status and metrics |
Stop() |
Stop heartbeat goroutine and close connection |
Lock API:
| Method | Description |
|---|---|
Token() |
Get fencing token for this lock |
Release(ctx) |
Release this lock |
Key behaviors:
- Automatic heartbeats every TTL/3 via gRPC streaming
- Thread-safe for concurrent use
- Locks auto-release when lease expires
- Uses bidirectional gRPC streaming (more efficient than HTTP polling)
Protect Redis operations by storing the fencing token alongside your data:
import (
"context"
"fmt"
"github.com/pixperk/lowkey/pkg/client"
"github.com/redis/go-redis/v9"
)
func processJob(ctx context.Context, lockClient *client.Client, redisClient *redis.Client) error {
lock, err := lockClient.Acquire(ctx, "daily-report")
if err != nil {
return fmt.Errorf("lock held by another instance: %w", err)
}
defer lock.Release(ctx)
token := lock.Token()
// Check if we have a stale token
storedToken, _ := redisClient.Get(ctx, "daily-report:token").Uint64()
if token < storedToken {
return fmt.Errorf("stale token %d < %d, aborting", token, storedToken)
}
// Perform protected operation with token validation
pipe := redisClient.TxPipeline()
pipe.Set(ctx, "daily-report:token", token, 0)
pipe.Set(ctx, "daily-report:data", "report-content", 0)
_, err = pipe.Exec(ctx)
return err
}Key insight: Even if a paused client wakes up with an expired lock, Redis will reject the write because token < storedToken.
Store the fencing token in a dedicated column and use conditional updates:
CREATE TABLE jobs (
name TEXT PRIMARY KEY,
last_run TIMESTAMP,
last_token BIGINT NOT NULL DEFAULT 0
);import (
"context"
"database/sql"
"github.com/pixperk/lowkey/pkg/client"
)
func runDatabaseJob(ctx context.Context, lockClient *client.Client, db *sql.DB) error {
lock, err := lockClient.Acquire(ctx, "db-migration")
if err != nil {
return err
}
defer lock.Release(ctx)
token := lock.Token()
// Conditional update: only succeed if our token is newer
result, err := db.ExecContext(ctx, `
UPDATE jobs
SET last_run = NOW(), last_token = $1
WHERE name = $2 AND last_token < $1
`, token, "db-migration")
if err != nil {
return err
}
rows, _ := result.RowsAffected()
if rows == 0 {
return fmt.Errorf("stale token, another instance already ran")
}
// Safe to proceed - we have the freshest token
return runMigration(ctx, db)
}Prevent split-brain writes to object storage:
func uploadWithToken(ctx context.Context, lockClient *client.Client, s3Client *s3.Client) error {
lock, err := lockClient.Acquire(ctx, "backup-upload")
if err != nil {
return err
}
defer lock.Release(ctx)
token := lock.Token()
// First, check the current token in metadata
head, err := s3Client.HeadObject(ctx, &s3.HeadObjectInput{
Bucket: aws.String("backups"),
Key: aws.String("latest.tar.gz"),
})
if err == nil {
storedToken, _ := strconv.ParseUint(head.Metadata["Fencing-Token"], 10, 64)
if token <= storedToken {
return fmt.Errorf("stale token, aborting upload")
}
}
// Upload with token in metadata
_, err = s3Client.PutObject(ctx, &s3.PutObjectInput{
Bucket: aws.String("backups"),
Key: aws.String("latest.tar.gz"),
Body: data,
Metadata: map[string]string{
"Fencing-Token": fmt.Sprintf("%d", token),
},
})
return err
}Pattern: Always include the fencing token with your write operations. The protected resource validates new_token > stored_token before accepting writes.
./bin/lowkey --bootstrap --data-dir ./dataThe first node bootstraps; the rest join via the bootstrap node's gRPC address. --join accepts a comma-separated seed list. The joiner walks the list and skips any seed that responds NotLeader, so listing several known members makes the join robust to which node is currently leader.
Node 1 (bootstrap):
./bin/lowkey \
--node-id <uuid-1> \
--raft-addr 10.0.1.1:7000 \
--grpc-addr 10.0.1.1:9000 \
--http-addr :8080 \
--data-dir /var/lib/lowkey \
--bootstrapNode 2:
./bin/lowkey \
--node-id <uuid-2> \
--raft-addr 10.0.1.2:7000 \
--grpc-addr 10.0.1.2:9000 \
--http-addr :8080 \
--data-dir /var/lib/lowkey \
--join 10.0.1.1:9000Node 3 (multiple seeds for robustness):
./bin/lowkey \
--node-id <uuid-3> \
--raft-addr 10.0.1.3:7000 \
--grpc-addr 10.0.1.3:9000 \
--http-addr :8080 \
--data-dir /var/lib/lowkey \
--join 10.0.1.1:9000,10.0.1.2:9000| Flag | Default | Description |
|---|---|---|
--node-id |
(generated UUID) | Unique node identifier |
--raft-addr |
127.0.0.1:7000 |
Raft consensus bind address |
--grpc-addr |
:9000 |
gRPC server listen address |
--http-addr |
:8080 |
HTTP gateway listen address |
--data-dir |
./data |
Data directory for Raft logs and snapshots |
--bootstrap |
false |
Bootstrap a new cluster (first node only) |
--join |
"" |
Comma-separated gRPC seed addresses to join an existing cluster |
--bootstrap and --join are mutually exclusive.
Check cluster status:
curl http://localhost:8080/v1/status{
"node_id": "9261ae0a-00cf-4463-9a55-445dba193fdf",
"is_leader": true,
"leader_address": "10.0.1.1:7000",
"cluster_size": 3,
"state": "leader",
"stats": { "leases": 4, "locks": 2, "fencing_counter": 137 }
}Add a peer manually (used by --join under the hood):
curl -X POST http://localhost:8080/v1/cluster/peers \
-d '{"node_id":"<uuid>","raft_address":"10.0.1.4:7000"}'Remove a peer:
curl -X DELETE http://localhost:8080/v1/cluster/peers/<uuid>Membership RPCs are leader-only; non-leaders return NotLeader (gRPC Unavailable) with the current leader's address.
A client first asks for a lease with a TTL. While the lease is alive, the client can acquire and release locks under it. To keep the lease alive past its TTL, the client sends periodic heartbeats (renewals every TTL/3 in the SDK). If the heartbeats stop, the lease expires and the server automatically releases any locks the lease held. This is how lowkey cleans up after crashed clients without any manual intervention.
The token is a monotonically-increasing counter handed out at acquire time. The protected resource records the highest token it has seen and rejects any write with a lower one, so a stale lock holder coming back from a long pause cannot overwrite work done by the current holder.
Key insight: Even if a client holds a stale lock, the protected resource will reject its operations.
lowkey decides lease expiry by comparing timestamps. If it used wall time, an NTP correction or a manual clock change could move "now" backwards and either expire leases early or keep dead leases alive. lowkey uses a monotonic clock that only ever moves forward, measured from a fixed epoch. Wall time is for logs and humans; it is not safe for invariants.
| Endpoint | Method | Description |
|---|---|---|
/v1/lease |
POST | Create a new lease |
/v1/lease/renew |
POST | Renew an existing lease (polling) |
/v1/lock/acquire |
POST | Acquire a lock (returns fencing token) |
/v1/lock/release |
POST | Release a lock |
/v1/status |
GET | Get cluster status and statistics |
/v1/cluster/peers |
POST | Add a voting peer to the cluster (leader-only) |
/v1/cluster/peers/{node_id} |
DELETE | Remove a peer from the cluster (leader-only) |
service LockService {
rpc CreateLease(CreateLeaseRequest) returns (CreateLeaseResponse);
rpc RenewLease(RenewLeaseRequest) returns (RenewLeaseResponse);
rpc Heartbeat(stream HeartbeatRequest) returns (stream HeartbeatResponse);
rpc AcquireLock(AcquireLockRequest) returns (AcquireLockResponse);
rpc ReleaseLock(ReleaseLockRequest) returns (ReleaseLockResponse);
rpc GetStatus(GetStatusRequest) returns (GetStatusResponse);
rpc AddPeer(AddPeerRequest) returns (AddPeerResponse);
rpc RemovePeer(RemovePeerRequest) returns (RemovePeerResponse);
}Note: HTTP clients poll /v1/lease/renew, gRPC clients use the Heartbeat stream for efficiency.
Using Go's built-in benchmark framework:
make bench-all # throughput across scenarios
make bench-sequential # single client baseline
make bench-parallel # multiple clients, unique locks
make bench-contention # multiple clients competing
make bench-percentiles # p50, p90, p99, p99.9Throughput (single-node lowkey on AMD Ryzen 7 5800HS):
| Benchmark | Operations | Latency/op | Scenario |
|---|---|---|---|
| Sequential | 4,460 | 3.24ms | Single client, measures Raft consensus + bbolt fsync overhead |
| Parallel | 19,911 | 0.60ms | Multiple clients, unique locks (true throughput) |
| Contention | 10,000 | 1.40ms | Multiple clients competing for same lock |
Latency Percentiles (1000 samples each):
| Scenario | p50 | p90 | p99 | p99.9 |
|---|---|---|---|---|
| Sequential | 2.5ms | 2.7ms | 3.1ms | 6.5ms |
| Parallel | 4.4ms | 5.5ms | 7.0ms | 25ms |
| Contention | 5.5ms | 6.5ms | 7.6ms | 7.6ms |
Caveat: These numbers are from a single-node lowkey, so the Raft
Applyonly touches local bbolt, no network replication. A 3-node cluster will pay an extra quorum round-trip per write. Treat the numbers as an upper bound, not a comparison against multi-node etcd/Consul deployments.
make test # unit + raft integration tests
make test-coverage # HTML coverage reportPercentile benches require a live lowkey at localhost:9000; they skip cleanly when nothing is listening (so go test ./... stays green in CI).
lowkey exposes Prometheus metrics at /metrics for monitoring lock performance, cluster health, and resource usage.
curl http://localhost:8080/metricsmake obs-up # starts Prometheus + Grafana
./bin/lowkey --bootstrap --data-dir ./data- Grafana: http://localhost:3000 (admin / admin)
- Prometheus: http://localhost:9090
The lowkey dashboard is auto-provisioned from grafana-provisioning/dashboards/lowkey-dashboard.json (12 panels covering lock latency, cluster health, throughput, failures).
Stop the stack: make obs-down · Logs: make obs-logs
| Metric | Type | Labels | Description |
|---|---|---|---|
lowkey_lock_acquire_duration_seconds |
Histogram | lock_name |
Time taken to acquire a lock (p50/p90/p99) |
lowkey_lock_acquire_total |
Counter | lock_name, status |
Total lock acquisitions (success/failure) |
lowkey_lock_release_total |
Counter | lock_name |
Total lock releases |
lowkey_locks_active |
Gauge | none | Currently held locks |
lowkey_lease_create_total |
Counter | none | Total leases created |
lowkey_lease_renew_total |
Counter | none | Total lease renewals (heartbeats) |
lowkey_lease_expire_total |
Counter | none | Total lease expirations (client failures) |
lowkey_leases_active |
Gauge | none | Currently active leases (connected clients) |
lowkey_heartbeat_total |
Counter | status |
Heartbeat success/failure count |
lowkey_raft_is_leader |
Gauge | none | Whether this node is leader (1) or follower (0) |
lowkey_raft_peers |
Gauge | none | Number of peers in cluster |
lowkey_raft_applied_index |
Gauge | none | Last Raft log index applied to FSM |
lowkey_up |
Gauge | none | Service uptime (always 1 when running) |
Example Prometheus queries:
# lock acquisition success rate
sum(rate(lowkey_lock_acquire_total{status="success"}[5m]))
/
sum(rate(lowkey_lock_acquire_total[5m]))
# p99 lock latency
histogram_quantile(0.99,
rate(lowkey_lock_acquire_duration_seconds_bucket[5m])
)
# active locks per lock name
sum by (lock_name) (lowkey_locks_active)
# lease expiration rate (client failures)
rate(lowkey_lease_expire_total[5m])
# cluster health: leader count (should always be 1)
sum(lowkey_raft_is_leader)
Raft Consensus Layer
- HashiCorp Raft implementation
- Leader election and log replication
- Persistent storage with BoltDB
- Automatic snapshots
Finite State Machine (FSM)
- Lease management with monotonic time
- Lock acquisition with fencing tokens
- Automatic cleanup on lease expiration
Dual API
- gRPC with bidirectional streaming (efficient heartbeats)
- HTTP REST with JSON (easy integration)
- gRPC-Gateway for HTTP/gRPC translation
- Raft is a proven algorithm with well-understood failure modes. CP in CAP, no split-brain under network partitions.
- Fencing tokens are monotonically-increasing counters: a mathematical guarantee against stale writes, provided the protected resource validates them.
- Lease-based locks clean themselves up when clients crash. No manual intervention needed, TTL is per client.
About lowkey
- Building lowkey, a distributed lock service in Go, the design walkthrough behind this repo
Papers and articles
- How to do distributed locking by Martin Kleppmann
- The Chubby lock service for loosely-coupled distributed systems by Google Research
- In Search of an Understandable Consensus Algorithm, the Raft paper
Implementations
- etcd, distributed KV store with locks
- Consul, service mesh with distributed locks
- Chubby, Google's distributed lock service
- hashicorp/raft, battle-tested Raft consensus implementation
- grpc-ecosystem/grpc-gateway, HTTP/gRPC bidirectional translation
- bbolt, embedded key-value database for persistent storage
- protobuf, Protocol Buffers for efficient serialization
MIT License, see LICENSE for details.
Contributions welcome. Open an issue or submit a pull request.
Areas for contribution:
- Client libraries for other languages (Python, Rust, Java)
- Observability (additional metrics, structured logging)
- Integration tests and chaos engineering
- Documentation and examples