Skip to content

Requests failing with EPIPE, ECONNRESET, Broken Pipe, "No status line received" errors or warnings #1114

@fviard

Description

@fviard

Probably also reported in the following older issues:
#912
#390
#314
#953

Sometimes, suddenly requests will fail with an error like EPIPE.
(Probably mostly when using AWS S3 service)

Most of the times, s3cmd was able to recover after a retry but a few seconds are lost waiting before the retry.

Today, I was finally able to understand the reasons of this issue.

After using a connection, instead of closing it, we keep it in a pool to be able to reuse it later.
And normally, s3cmd is almost always communicating with the service, and so the connections will not stay for too long in the pool before being reused.
But, in some circonstances, we can have a connection waiting in a pool with no traffic for a long time before it will be reused for a new request.
For example, it can be the case when we calculate the hash of a very big file (that can run for 30s or 1 minute).

The root cause of the issue is that, in such a case, AWS will unilaterally close the connection after a "short" idle time and so we will encounter an error when trying to reuse the connection.
(As, network stacks does not allow to detect closed/broken connection before we try to reuse them)

And recently, for the first time, AWS added the following info to its documentation:

https://aws.amazon.com/fr/premiumsupport/knowledge-center/s3-socket-connection-timeout-error/

When the connection between the client and the Amazon S3 server remains idle for 20 seconds or longer, Amazon S3 closes the connection.

So, know we know that there is indeed a timeout, and that its value is 20 seconds!

In addition, I discovered today that there could be two different system/python errors for the same issue depending of the kind of request that is trying to use the closed connection:

For standard GET connections:

DEBUG: Response:
{}
WARNING: Retrying failed request: /bigsparse/sparse-test.big?part-number-marker=1000&uploadId=HDzIZLD....TE9 (No status line received - the server has closed the connection)
WARNING: Waiting 3 sec...

For send_file / PUT connections:

DEBUG: format_uri(): /bigsparse/sparse-test.big?partNumber=1070&uploadId=HDzIZLD....TE9
   65536 of 5242880     1% in    0s     4.50 MB/s  failed
ERROR: Cannot retrieve any response status before encountering an EPIPE or ECONNRESET exception
WARNING: Upload failed: /bigsparse/sparse-test.big?partNumber=1070&uploadId=HDzIZLD....TE9 ([Errno 32] Broken pipe)
WARNING: Waiting 3 sec...

So, the obvious solution to this issue would be to have our own timeout on idle connections to not reuse too old ones.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions