When you specify an S3 URL which does not include a region, you may receive from AWS either a 400 response with a body specifying where to retry, or - if the bucket is relatively new - a 307 response.
The HTSLib S3 plugin does not correctly follow the 307 redirects that AWS returns for newly created buckets.
Note that, in the logs below, rather than sending a HTTPS request to the specified location (https://samtools-reads-data.s3.eu-west-2.amazonaws.com/r.cram), it sends a HTTP request, leaving out the bucket name (http://s3.eu-west-2.amazonaws.com/r.cram), and so will always return a 404.
Logs below are from 1.18 on MacOS, but have also seen this in 1.19 on AmazonLinux.
From my experience, AWS will start sending 400 responses rather than 307s the day after the bucket was created, at which point the "samtools view" command works.
❯ aws s3api create-bucket --bucket samtools-reads-data --create-bucket-configuration LocationConstraint=eu-west-2
❯ samtools view s3://samtools-reads-data/r.cram --verbosity 10
[D::init_add_plugin] Loaded "mem"
[D::init_add_plugin] Loaded "crypt4gh-needed"
[D::init_add_plugin] Loaded "libcurl"
[D::init_add_plugin] Loaded "gcs"
[D::init_add_plugin] Loaded "s3"
[D::init_add_plugin] Loaded "s3w"
* Trying 3.5.7.115:443...
* Connected to samtools-reads-data.s3.amazonaws.com (3.5.7.115) port 443
* ALPN: curl offers h2,http/1.1
* CAfile: /etc/ssl/cert.pem
* CApath: none
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN: server accepted http/1.1
* Server certificate:
* subject: CN=*.s3.amazonaws.com
* start date: Oct 10 00:00:00 2023 GMT
* expire date: Jul 3 23:59:59 2024 GMT
* subjectAltName: host "samtools-reads-data.s3.amazonaws.com" matched cert's "*.s3.amazonaws.com"
* issuer: C=US; O=Amazon; CN=Amazon RSA 2048 M01
* SSL certificate verify ok.
* using HTTP/1.1
> GET /r.cram HTTP/1.1
Host: samtools-reads-data.s3.amazonaws.com
User-Agent: htslib/1.18 libcurl/8.4.0
Accept: */*
Authorization: REDACTED
x-amz-date: 20240319T145000Z
x-amz-content-sha256: READACTED
X-Amz-Security-Token: REDACTED
< HTTP/1.1 307 Temporary Redirect
< x-amz-bucket-region: eu-west-2
< x-amz-request-id: REDACTED
< x-amz-id-2: REDACTED
< Location: https://samtools-reads-data.s3.eu-west-2.amazonaws.com/r.cram
< Content-Type: application/xml
< Transfer-Encoding: chunked
< Date: Tue, 19 Mar 2024 14:50:00 GMT
< Server: AmazonS3
<
* Trying 52.95.191.45:80...
* Connected to s3.eu-west-2.amazonaws.com (52.95.191.45) port 80
> GET /r.cram HTTP/1.1
Host: s3.eu-west-2.amazonaws.com
User-Agent: htslib/1.18 libcurl/8.4.0
Accept: */*
Authorization: REDACTED
x-amz-date: 20240319T145000Z
x-amz-content-sha256: REDACTED
X-Amz-Security-Token: REDACTED
< HTTP/1.1 404 Not Found
< x-amz-request-id: REDACTED
< x-amz-id-2: REDACTED
< Content-Type: application/xml
< Transfer-Encoding: chunked
< Date: Tue, 19 Mar 2024 14:50:00 GMT
< Server: AmazonS3
<
* Closing connection
* Closing connection
[E::hts_open_format] Failed to open file "s3://samtools-reads-data/r.cram" : No such file or directory
samtools view: failed to open "s3://samtools-reads-data/r.cram" for reading: No such file or directory
Documentation of the 307 redirect: https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingRouting.html
When you specify an S3 URL which does not include a region, you may receive from AWS either a 400 response with a body specifying where to retry, or - if the bucket is relatively new - a 307 response.
The HTSLib S3 plugin does not correctly follow the 307 redirects that AWS returns for newly created buckets.
Note that, in the logs below, rather than sending a HTTPS request to the specified location (https://samtools-reads-data.s3.eu-west-2.amazonaws.com/r.cram), it sends a HTTP request, leaving out the bucket name (http://s3.eu-west-2.amazonaws.com/r.cram), and so will always return a 404.
Logs below are from 1.18 on MacOS, but have also seen this in 1.19 on AmazonLinux.
From my experience, AWS will start sending 400 responses rather than 307s the day after the bucket was created, at which point the "samtools view" command works.
Documentation of the 307 redirect: https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingRouting.html