Add 'hash' client allocation mode#88
Merged
Merged
Conversation
Collaborator
|
Hello @bernardd! All of these changes looks really awesome, and the code is top notch! However when I try to run common tests on it, I get that the |
Collaborator
Author
|
Derp. Rookie git mistake. I've added it, plus some more tests and cleanups for existing ones. A little bit more to come, but I think it's mostly there now. |
Collaborator
Author
|
@matehat Should be good to merge now, pending any comments/changes from you. |
Collaborator
|
Thanks @bernardd for the awesome work! I've looked over the change and played with the client and it now performs faster and hopefully with more stability! 👍 x 1000 |
Collaborator
|
FIY: I'm getting a roughly 2x throughput increase with your changes, on a 4-core (8v) 2.5 GHz rMBP |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
NOTE: This isn't intended to be merged yet - I still need to add some docs and further ct tests. The core functionality, however, is done and I figured I'd share it to get any input you (@matehat) or anyone else has before getting too far along.
A note on terminology: When I say "client" I mean a
cqerl_clientprocess. When I say "user" I mean any process in an application that callscqerl:get_clientorcqerl:new_clientto get a client to use to make a query.So - we've been having a few intermittent problems with the existing pooler-based implementation. Fundamentally, it doesn't seem to be an ideal choice for managing a pool of clients where each client can have multiple users, as is the case with cqerl clients. Additionally, as has been noted in #49, it has scalability issues with bottlenecks (that may not be endemic to
poolerper se, but are certainly present in our use of it). I looked overdispcountas an alternative, but as I explained in my comment on #49, that doesn't seem ideal either.What I've ended up doing, then, is to invent my own solution (which steals a couple of dispcount's neat ideas that fit our problem). Basically, for each "key" (see below), we have a set of client processes started on the first attempt to get a client using that key. Those clients, once started, are available to all users, and are allocated from a pool using a dispcount-style hash of the user's PID. Because they're multi-user clients, however, we don't need dispcount's exclusivity guarantees and so can allocate multiple users to each client.
The "key" is essentially the
NodeKeyused by the pooler implementation - a tuple of theNodeandOptsused to originally set up the clients. I think this could probably stand to be simplified a bit, but for now it's what I've got. Every unique key currently startsnum_clients(default 20) client processes - I'd also like to make that configurable on a per-keyspace basis.Each set of clients is monitored by a
one_for_onesupervisor (call it the "set supervisor"). And these set supervisors are, in turn, monitored under asimple_one_for_onesupervisor calledcqerl_client_sup. What this means is that individual failures/crashes amongst a set of clients will be restarted by their set supervisor and only cause transient error conditions for any user trying to access them. If, on the other hand, the whole Cassandra server goes down (or something else equally catastrophic happens) then the clients retrying will quickly hit the set supervisor's restart intensity and cause it to shutdown too. Sincecqerl_client_supis asimple_one_for_one, however, the failure won't cascade any further upwards. Further, the next attempt by a user to access a client for that key will create a new set of clients. If they connect successfully, everything is good and we keep going. If they don't, the user (and any other uses waiting on their creation) will again receive error responses.In addition to the set supervisor, each key has its own ets table which is used to map the hash values to a client PID. These tables are maintained by the
cqerl_hashmodule/process (registered ascqerl) which monitors all the client processes and their immediate supervisors. They are further indexed by a "root" ets table namedcqerl_client_tableswhich maps keys to their respective tables. Thus in the normal course of events, any request for a client requires two ets lookups to read-optimised tables (in much the same manner as dispcount): First to get the client table for a key, then to get the client allocated to the hash of the user process pid.I've tried, as far as possible, to make this change backwards compatible. If you change nothing, the behaviour still uses the existing pooler system. Adding
{mode, hash}to the cqerl app config enables the new system.The other change required is to call
cqerl:get_client/2rather thancqerl:new_client. I made that distinction mostly so that users would not accidentally end up using the wrong mode, but also because it would have required fiddly and otherwise unnecessary checks innew_clientto have it work for both.Calling
cqerl:close_client/1in hash mode is not necessary, but nor is it harmful.I've changed the integration and load tests to run in both modes, and everything seems to work okay. Load tests for hash mode are marginally (maybe ~20%) faster on my system, but it's a VM with only 2 cores - I hope on beefier machines that the improvement will be more pronounced, but I've yet to have a chance to test that.