Tags: AutoMQ/automq
Tags
fix: precompute s3 write checksums (#3416) (#3417) ## Summary This PR hardens S3 object writes against dirty buffer reads during SDK-level retries. ## Bug Flow AutoMQ writes S3 request bodies with `AsyncRequestBody.fromByteBuffersUnsafe(...)` to avoid copying Netty `ByteBuf` data. A rare corruption flow can happen when an SDK attempt times out but the underlying HTTP write is still alive: 1. AutoMQ submits a PutObject or UploadPart request backed by unsafe `ByteBuffer` views of a Netty `ByteBuf`. 2. The SDK attempt times out and starts a retry, while the first low-level request may still continue writing from the original buffers. 3. A later retry succeeds and completes the future. 4. The caller releases the `ByteBuf`; Netty may recycle the memory for another buffer. 5. The stale first request can still read from the recycled memory and send dirty bytes. 6. Without a stable precomputed checksum attached to the request, object storage may accept and persist the corrupted body. ## How This PR Fixes It The fix is to bind each write request to the bytes that were present before the unsafe buffers are passed to the SDK. For PutObject and UploadPart, AutoMQ now computes the request checksum synchronously while it still owns a valid `ByteBuf`. That checksum becomes the immutable expected value for the request body: - If the SDK later retries normally using the same original bytes, the object storage service receives bytes matching the precomputed checksum and accepts the write. - If an earlier timed-out HTTP attempt keeps running after the successful retry and reads from recycled Netty memory, it sends bytes that no longer match the precomputed checksum. The service should reject that stale request instead of persisting corrupted data. This is why the checksum must be computed before `AsyncRequestBody.fromByteBuffersUnsafe(...)` is handed to the SDK. Letting the SDK calculate a checksum from the unsafe body during the actual HTTP attempt would not fully protect this flow, because the checksum calculation could observe the same dirty/recycled memory as the stale request. The implementation uses the strongest available request-side checksum for the configured mode: - When a supported S3 flexible checksum algorithm is configured, AutoMQ precomputes and sets the concrete checksum header (`checksumCRC32`, `checksumCRC32C`, `checksumSHA1`, or `checksumSHA256`) on PutObject and UploadPart. - When no flexible checksum algorithm is configured, AutoMQ precomputes and sets `Content-MD5`. - For multipart uploads, CreateMultipartUpload still carries `checksumAlgorithm` to define the upload's checksum algorithm, and CompleteMultipartUpload sends the returned per-part checksum in the matching part checksum field instead of assuming CRC32C. - The SDK's legacy S3 ETag MD5 validation path is disabled to avoid a second client-side MD5 pass after AutoMQ already attaches its own request checksum.
codex autobalancer decision trace query
PreviousNext