Handle reconnects while waiting for responses#203
Conversation
If a reconnect happens while we are waiting for a response, we will time out, even though we know exactly what is wrong in that case. Detect the reconnecting message, redeliver it to the mailbox and respond back with an immediate message.
|
This is a great topic!! thanks for helping us with Apns4erl. I am going to take a look carefully to your discussion. We started this apns4erl v2 using |
|
We don't necessarily need to use gun, just take its way of operating into account. Especially its I've deployed the above for now to our infrastructure, and I can get back a |
|
I think we can close this since we moved from |
This is a bit of a contended PR, but I'll describe it anyway
Symptom
apnsagain, you get an error like the following:exception exit: timeout in function wpool_pool:call_available_worker/3Analysis
While
apns_connectionis waiting for anapns_responsemessage,chatterboxrealizes the underlying HTTP/2 is severed. So it tells the apns_connection process. This process tells the rest of the world that we are currently reconnecting, but the receive clause never realizes this.It is not wrong to time out. But the current state of the connection is actually known to the underlying system. It just doesn't tell us right away and we have to go around the worker pool and timeout/reconnect each worker before we are back in operation. This incurs latency which isn't necessary.
Current fix
Detect reconnects, and resend the message into the mailbox. It is not a perfect fix and requires some maintenance in the future, see the following section:
Discussion
We probably need a better solution than this over time. The key is that pushing notifications over HTTP/2 ought to be asynchronous. We want a system which can push many messages over the same HTTP/2 connection, pipeline them aggressively and wait for responses. This suggests the system should be more async than it currently is. The other right thing to do here is to lift the
apns_responsemessage up so it sits in the same event loop as{reconnecting, _}and{connection_up, _}. That way, you don't have to handle it specially.We may get some ideas from Loic's
gunproject here. When he awaits a response, he also handles thegun_downsituation (and adds a monitor on top to boot, which is good style anyway). Pushes are always async and he reifies synchronous messages by going straight into the await loop, much likeapnsis currently doing. But it might make sense to streamline the event typesapns_down,apns_upand so on.We also need to understand the fabric between chatterbox and apns. We currently run a worker pool and keep
Nconnections to APNS. But inside a single HTTP/2 stream, we can run many connections as well. My experience from RabbitMQ is that I can easily push and receive 50k RPC messages over rabbit per second on a single connection. So perhaps having a worker pool ends up being overkill: keep a single HTTP/2 connection and fill it up with streams?