Skip to content

Conversation

@shuw
Copy link
Contributor

@shuw shuw commented Feb 13, 2025

Improves S3 client resilience by adding explicit retry handling for permission errors (403 Forbidden). This is because we have seen rate limiting in the auth layer of s3 manifest as 403/Forbidden and we want to backoff/retry in this case.

Add explicit handling of various throttling-related 403 errors that should be retried, while ensuring authentication and access-related errors fail fast. This improves reliability when dealing with service-side throttling while preventing unnecessary retries on permanent auth failures.
storage/s3.go Outdated
func (c *customRetryer) ShouldRetry(req *request.Request) bool {
shouldRetry := errHasCode(req.Error, "InternalError") || errHasCode(req.Error, "RequestTimeTooSkewed") || errHasCode(req.Error, "SlowDown") || strings.Contains(req.Error.Error(), "connection reset") || strings.Contains(req.Error.Error(), "connection timed out")
if c.retryForbidden && !shouldRetry {
shouldRetry = errHasCode(req.Error, "Forbidden") || errHasCode(req.Error, "AccessDenied")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious what's the difference between Forbidden and AccessDenied in terms of HTTP access code. Which one does 403 map to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I think the code can be simplified like follows

shouldRetry := errHasCode(req.Error, "InternalError") || 
               errHasCode(req.Error, "RequestTimeTooSkewed") || 
               errHasCode(req.Error, "SlowDown") || 
               strings.Contains(req.Error.Error(), "connection reset") || 
               strings.Contains(req.Error.Error(), "connection timed out") ||
               (c.retryForbidden && (errHasCode(req.Error, "Forbidden") || errHasCode(req.Error, "AccessDenied")))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure exactly what is happening above this, but when I make a request for a file that I don't have permission on, without retries, it comes into this code twice, first with an error that has code "Forbidden", then with an error that has code "AccessDenied". It looks like the calling of this is controlled by the AWS SDK because there are no references to the ShouldRetry function in the s5cmd codebase

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test this with the latest s5cmd version? The one on the mono-repo was only updated ~45m ago, but on the new version it removes the Google STS token request from this path (the http client's base transport). Its possible that is why you were seeing two error codes.

levonlloyd
levonlloyd previously approved these changes Feb 13, 2025
storage/s3.go Outdated
func (c *customRetryer) ShouldRetry(req *request.Request) bool {
shouldRetry := errHasCode(req.Error, "InternalError") || errHasCode(req.Error, "RequestTimeTooSkewed") || errHasCode(req.Error, "SlowDown") || strings.Contains(req.Error.Error(), "connection reset") || strings.Contains(req.Error.Error(), "connection timed out")
if c.retryForbidden && !shouldRetry {
shouldRetry = errHasCode(req.Error, "Forbidden") || errHasCode(req.Error, "AccessDenied")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure exactly what is happening above this, but when I make a request for a file that I don't have permission on, without retries, it comes into this code twice, first with an error that has code "Forbidden", then with an error that has code "AccessDenied". It looks like the calling of this is controlled by the AWS SDK because there are no references to the ShouldRetry function in the s5cmd codebase

levonlloyd
levonlloyd previously approved these changes Feb 13, 2025
@steveryb
Copy link
Contributor

cc @tminusplus who has been looking at this code - likely a better reviewer than me.

@steveryb steveryb requested review from tminusplus and removed request for steveryb February 13, 2025 20:22
It adds the retry-forbidden flag that will retry operations if the
failure is AccessDenied or Forbidden.  Additional test cases cover
all 4 new configurations and I manually tested it by attempting to
copy a file that I didn't have access to in my current configuration and
I verified that it behaves as expected.
@levonlloyd levonlloyd merged commit ae25bcb into master Feb 13, 2025
12 checks passed
@levonlloyd levonlloyd deleted the shuw/rate_limit branch February 13, 2025 22:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants