-
Notifications
You must be signed in to change notification settings - Fork 2
[Storage] Enhance S3 retry logic for throttling errors #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add explicit handling of various throttling-related 403 errors that should be retried, while ensuring authentication and access-related errors fail fast. This improves reliability when dealing with service-side throttling while preventing unnecessary retries on permanent auth failures.
3fb2db1 to
d706711
Compare
storage/s3.go
Outdated
| func (c *customRetryer) ShouldRetry(req *request.Request) bool { | ||
| shouldRetry := errHasCode(req.Error, "InternalError") || errHasCode(req.Error, "RequestTimeTooSkewed") || errHasCode(req.Error, "SlowDown") || strings.Contains(req.Error.Error(), "connection reset") || strings.Contains(req.Error.Error(), "connection timed out") | ||
| if c.retryForbidden && !shouldRetry { | ||
| shouldRetry = errHasCode(req.Error, "Forbidden") || errHasCode(req.Error, "AccessDenied") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious what's the difference between Forbidden and AccessDenied in terms of HTTP access code. Which one does 403 map to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I think the code can be simplified like follows
shouldRetry := errHasCode(req.Error, "InternalError") ||
errHasCode(req.Error, "RequestTimeTooSkewed") ||
errHasCode(req.Error, "SlowDown") ||
strings.Contains(req.Error.Error(), "connection reset") ||
strings.Contains(req.Error.Error(), "connection timed out") ||
(c.retryForbidden && (errHasCode(req.Error, "Forbidden") || errHasCode(req.Error, "AccessDenied")))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure exactly what is happening above this, but when I make a request for a file that I don't have permission on, without retries, it comes into this code twice, first with an error that has code "Forbidden", then with an error that has code "AccessDenied". It looks like the calling of this is controlled by the AWS SDK because there are no references to the ShouldRetry function in the s5cmd codebase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you test this with the latest s5cmd version? The one on the mono-repo was only updated ~45m ago, but on the new version it removes the Google STS token request from this path (the http client's base transport). Its possible that is why you were seeing two error codes.
storage/s3.go
Outdated
| func (c *customRetryer) ShouldRetry(req *request.Request) bool { | ||
| shouldRetry := errHasCode(req.Error, "InternalError") || errHasCode(req.Error, "RequestTimeTooSkewed") || errHasCode(req.Error, "SlowDown") || strings.Contains(req.Error.Error(), "connection reset") || strings.Contains(req.Error.Error(), "connection timed out") | ||
| if c.retryForbidden && !shouldRetry { | ||
| shouldRetry = errHasCode(req.Error, "Forbidden") || errHasCode(req.Error, "AccessDenied") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure exactly what is happening above this, but when I make a request for a file that I don't have permission on, without retries, it comes into this code twice, first with an error that has code "Forbidden", then with an error that has code "AccessDenied". It looks like the calling of this is controlled by the AWS SDK because there are no references to the ShouldRetry function in the s5cmd codebase
d706711 to
d24ea56
Compare
|
cc @tminusplus who has been looking at this code - likely a better reviewer than me. |
It adds the retry-forbidden flag that will retry operations if the failure is AccessDenied or Forbidden. Additional test cases cover all 4 new configurations and I manually tested it by attempting to copy a file that I didn't have access to in my current configuration and I verified that it behaves as expected.
d24ea56 to
ebb584c
Compare
Improves S3 client resilience by adding explicit retry handling for permission errors (403 Forbidden). This is because we have seen rate limiting in the auth layer of s3 manifest as 403/Forbidden and we want to backoff/retry in this case.