Summary
decrypt.Data(data []byte, format string) ([]byte, error) (and the related decrypt.File) doesn't accept a context.Context. Consumers that bind sops into a request path or a boot sequence can't interrupt a stuck KMS round-trip — the call blocks until the underlying provider's own internal timeout fires (often 30s+).
Reproduction
ctx, cancel := context.WithTimeout(context.Background(), 100*time.Millisecond)
defer cancel()
// We can check ctx before calling…
if err := ctx.Err(); err != nil {
return err
}
// …but once we're in here, the ctx is irrelevant.
plaintext, err := decrypt.Data(ciphertext, "yaml")
// If the KMS provider is hung, this blocks for ~30s regardless
// of the 100ms deadline we set above.
Affects every key service that calls out to a remote provider: AWS KMS, GCP KMS, Azure Key Vault, HashiCorp Vault. The underlying provider SDKs (aws-sdk-go-v2, cloud.google.com/go/kms, etc.) accept contexts; sops just doesn't thread one through.
Real-world impact
We hit this in chameleon, a layered-config library that wraps sops/decrypt for encrypted layer files. A hung GCP KMS round during application boot blocks initialization indefinitely instead of failing fast. Our workaround is checking ctx.Err() before invoking sops; once we're in the call, we're stuck. Documented as a known limitation on our side: chameleon README ("Sops API has no ctx.Context hook").
Proposed API
Add ctx-aware variants alongside the existing functions (backward compatible):
// New, in package decrypt:
func DataWithContext(ctx context.Context, data []byte, format string) ([]byte, error)
func FileWithContext(ctx context.Context, path, format string) ([]byte, error)
The existing context-less functions can delegate to the ctx variants with context.Background(). Internally, the ctx threads to each key-service call (aws-sdk's WithContext request options, GCP's existing ctx-first methods, etc.).
For sops v4 / next major: replace the existing signatures.
Alternatives considered
- Goroutine + select-on-channel — works but leaks goroutines on cancel (no way to interrupt the blocked provider call). Worse than no fix.
runtime.Goexit from a watchdog goroutine — actively destructive: leaks resources held by the provider SDK (connections, mutexes).
- Documented "don't call from latency-sensitive paths" — true but unhelpful for boot-time use.
Scope estimate
Mechanical: every kms.Decrypt(...) / kv.Decrypt(...) call site inside the key-service implementations under keyservice/ accepts a context already; threading from a new DataWithContext entry point through is a one-pass change. Tests need a fake key service that respects a cancel signal.
Happy to put up a PR if there's appetite — would appreciate a maintainer comment first on the API shape (separate WithContext functions vs. breaking change).
Summary
decrypt.Data(data []byte, format string) ([]byte, error)(and the relateddecrypt.File) doesn't accept acontext.Context. Consumers that bind sops into a request path or a boot sequence can't interrupt a stuck KMS round-trip — the call blocks until the underlying provider's own internal timeout fires (often 30s+).Reproduction
Affects every key service that calls out to a remote provider: AWS KMS, GCP KMS, Azure Key Vault, HashiCorp Vault. The underlying provider SDKs (
aws-sdk-go-v2,cloud.google.com/go/kms, etc.) accept contexts; sops just doesn't thread one through.Real-world impact
We hit this in chameleon, a layered-config library that wraps sops/decrypt for encrypted layer files. A hung GCP KMS round during application boot blocks initialization indefinitely instead of failing fast. Our workaround is checking
ctx.Err()before invoking sops; once we're in the call, we're stuck. Documented as a known limitation on our side: chameleon README ("Sops API has noctx.Contexthook").Proposed API
Add ctx-aware variants alongside the existing functions (backward compatible):
The existing context-less functions can delegate to the ctx variants with
context.Background(). Internally, the ctx threads to each key-service call (aws-sdk'sWithContextrequest options, GCP's existing ctx-first methods, etc.).For sops v4 / next major: replace the existing signatures.
Alternatives considered
runtime.Goexitfrom a watchdog goroutine — actively destructive: leaks resources held by the provider SDK (connections, mutexes).Scope estimate
Mechanical: every
kms.Decrypt(...)/kv.Decrypt(...)call site inside the key-service implementations underkeyservice/accepts a context already; threading from a newDataWithContextentry point through is a one-pass change. Tests need a fake key service that respects a cancel signal.Happy to put up a PR if there's appetite — would appreciate a maintainer comment first on the API shape (separate
WithContextfunctions vs. breaking change).