Skip to content

S3 plugin does not correctly handle 307 redirects for newly created buckets #1760

@andaca

Description

@andaca

When you specify an S3 URL which does not include a region, you may receive from AWS either a 400 response with a body specifying where to retry, or - if the bucket is relatively new - a 307 response.

The HTSLib S3 plugin does not correctly follow the 307 redirects that AWS returns for newly created buckets.
Note that, in the logs below, rather than sending a HTTPS request to the specified location (https://samtools-reads-data.s3.eu-west-2.amazonaws.com/r.cram), it sends a HTTP request, leaving out the bucket name (http://s3.eu-west-2.amazonaws.com/r.cram), and so will always return a 404.

Logs below are from 1.18 on MacOS, but have also seen this in 1.19 on AmazonLinux.

From my experience, AWS will start sending 400 responses rather than 307s the day after the bucket was created, at which point the "samtools view" command works.

❯ aws s3api create-bucket --bucket samtools-reads-data --create-bucket-configuration LocationConstraint=eu-west-2

❯ samtools view s3://samtools-reads-data/r.cram --verbosity 10
[D::init_add_plugin] Loaded "mem"
[D::init_add_plugin] Loaded "crypt4gh-needed"
[D::init_add_plugin] Loaded "libcurl"
[D::init_add_plugin] Loaded "gcs"
[D::init_add_plugin] Loaded "s3"
[D::init_add_plugin] Loaded "s3w"
*   Trying 3.5.7.115:443...
* Connected to samtools-reads-data.s3.amazonaws.com (3.5.7.115) port 443
* ALPN: curl offers h2,http/1.1
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN: server accepted http/1.1
* Server certificate:
*  subject: CN=*.s3.amazonaws.com
*  start date: Oct 10 00:00:00 2023 GMT
*  expire date: Jul  3 23:59:59 2024 GMT
*  subjectAltName: host "samtools-reads-data.s3.amazonaws.com" matched cert's "*.s3.amazonaws.com"
*  issuer: C=US; O=Amazon; CN=Amazon RSA 2048 M01
*  SSL certificate verify ok.
* using HTTP/1.1
> GET /r.cram HTTP/1.1
Host: samtools-reads-data.s3.amazonaws.com
User-Agent: htslib/1.18 libcurl/8.4.0
Accept: */*
Authorization: REDACTED
x-amz-date: 20240319T145000Z
x-amz-content-sha256: READACTED
X-Amz-Security-Token: REDACTED

< HTTP/1.1 307 Temporary Redirect
< x-amz-bucket-region: eu-west-2
< x-amz-request-id: REDACTED
< x-amz-id-2: REDACTED
< Location: https://samtools-reads-data.s3.eu-west-2.amazonaws.com/r.cram
< Content-Type: application/xml
< Transfer-Encoding: chunked
< Date: Tue, 19 Mar 2024 14:50:00 GMT
< Server: AmazonS3
<
*   Trying 52.95.191.45:80...
* Connected to s3.eu-west-2.amazonaws.com (52.95.191.45) port 80
> GET /r.cram HTTP/1.1
Host: s3.eu-west-2.amazonaws.com
User-Agent: htslib/1.18 libcurl/8.4.0
Accept: */*
Authorization: REDACTED
x-amz-date: 20240319T145000Z
x-amz-content-sha256: REDACTED
X-Amz-Security-Token: REDACTED

< HTTP/1.1 404 Not Found
< x-amz-request-id: REDACTED
< x-amz-id-2: REDACTED
< Content-Type: application/xml
< Transfer-Encoding: chunked
< Date: Tue, 19 Mar 2024 14:50:00 GMT
< Server: AmazonS3
<
* Closing connection
* Closing connection
[E::hts_open_format] Failed to open file "s3://samtools-reads-data/r.cram" : No such file or directory
samtools view: failed to open "s3://samtools-reads-data/r.cram" for reading: No such file or directory

Documentation of the 307 redirect: https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingRouting.html

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions