Add JDBC client throttling — analysis document, implementation and documentation#523
Conversation
Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/a19cc33b-051c-409c-b8cf-9f7e1c9a4e98 Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/d29299c3-d775-4b03-9366-e1a89b1ecb28 Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
rrobetti
left a comment
There was a problem hiding this comment.
Evaluate the following idea:
Add new information to the SessionInfo returned to clients, the extra info will be 2 new fields:
1 - the number of clients connected to that datasource with same user and password (session hash)
2 - the max admission concurrent requests allowed in the server.
The OJP JDBC client shall use this info to create a semaphore and restrict how many concurrent requests it sends to OJP server, each cliend shall set its semaphore by calculating by the formula (max admission of ojp server divided by number of clients of that datasource) multiplied by the number of OJP servers in the cluster.
…IVE_THROTTLING_ANALYSIS.md Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/474a33bb-3629-4cce-9651-8df0768af306 Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
Evaluated in commit Summary of findings: The formula works, but two things need to be pinned before implementation:
Main risks identified:
Positive: Overall this design is materially better than purely reactive for steady-state behavior. The protocol cost (two |
|
@copilot Use Ceiling division + a 10% safety headroom and document the rational. Do include A step-limited increase (AIMD-style) and document the rational. In relation to "Semaphore resizing is non-trivial" is there any cheap(efficient) alternative to control concurrency that can be easily resized and that does not add latency in the flow? "Cross-node clientCount accuracy" I do not understand what this means, explain in simple language with simple example. "In-transaction bypass" , I am not sure if this is a problem, explain in simple language with a simple example. Keep only the analysis for now, do not change the code yet. |
…ve, AIMD, plain-language examples Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/221f3b7b-572e-4c3a-a6b3-19471bba8684 Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
Updated in commit 1. 2. 3. Ceiling division + 10% headroom. S1 now explains the rationale: floor division leaves capacity permanently wasted; ceiling slightly over-allocates, which the 10% headroom compensates for by absorbing one stale 4. AIMD step-limited increase. S4 now has the full algorithm: decrease applies immediately (fast overload response), increase is capped at 5. Semaphore alternative. The new Con #5 shows that a simple 6. Cross-node 7. In-transaction bypass — plain-language example added. C5 now walks through the step-by-step scenario where Thread A holds an open transaction, the client semaphore fills, Thread A blocks waiting for a permit, and the server's transaction timeout fires before Thread A ever sends its next statement. Fix: skip the |
rrobetti
left a comment
There was a problem hiding this comment.
@copilot we should have two modes configurable, proactive and reactive, in the proactive mode we do the client trottling with the atomic integer and fail fast as suggested, if reactive mode I want to explore and hear your opinions on keeping a record of how many simultaneous connections was the server serving before one got an admission timout, the goal with that is to send a smarter number to the client instead of the fixed total the server was configured to take, in real scenarios ig might break before that, for example, if ojp server was being able to serve 50 simultaneous requests without errors but when it tried to serve 51 it got a admission timout error, it means its limit is 50 and that is the number we should send to the client. Tell me your concerns, opinions, sugestions and questions about this new idea.
Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/6a205c32-5798-4524-9ebf-267a654a8ced Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
Evaluated and documented in commit Two-Mode Framework — a short overview table comparing proactive, reactive, and combined modes, showing what each requires and recommends. Reactive Mode with How it would work in
Main concerns raised:
Opinion (confidence: 85%): The idea is sound — it's the TCP CWND model applied to OJP. The "false floor" and recovery rate are the only significant risks and both have clear mitigations. The recommended v1 is combined mode: send both |
Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/83cf8215-d593-44ae-bfde-a072345d157f Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
Done in commit
|
…uent requests client-side Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/5b54bf63-7220-422d-b3c4-eeb145e5330d Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
... Fixed in commit Root cause: When the server returned What changed:
Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
@copilot the errors remain, I noticed they only occurr when slow query segregation is enabled. |
…ion=0; use fastSlots for maxAdmission Root cause 1 (driver): ClientThrottleManager.updateFromSessionInfo() was resetting reactiveLimit = MAX_VALUE whenever it received a SessionInfo with maxAdmission=0. Since executeUpdate/executeQuery responses always carry maxAdmission=0, this silently undid every notifyServerOverload() adjustment on the first successful SQL response. Fix: return early when maxAdmission <= 0. Root cause 2 (server): ConnectAction sent maxAdmission = totalSlots regardless of the SQS split. With default 20% slow slots, fastSlots=8 out of 10 total. The client's proactiveLimit (90% of 10 = 9) exceeded the 8 fast-lane slots, triggering immediate admission timeouts before the reactive throttle had a chance to activate. Fix: SlotManager.getEffectiveMaxAdmission() returns fastSlots when slowSlots>0, totalSlots otherwise; ConnectAction now calls this method. Analysis doc updated to reflect both fixes. Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/895e1b8d-3550-47aa-b460-84b5d8511181 Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
Fixed in commit Root cause 1 (driver) — Root cause 2 (server) — Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
@copilot review your last fix, I have double the load and the errors now appear in both SQS and non SQS non SQS Error Breakdown
SQS Error Breakdown
|
…path Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/35d7894b-9a8b-4dd3-84ba-b8df179b8130 Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
|
@copilot create a simple and easy to follow with flow diagram mermaid to explain how the throttling works. Use simple examples to explain. avoid being too verbose. |
Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/9fa1b964-37b0-495b-a1a3-a73a648b4dad Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
Added a simple Mermaid flow plus two short examples in |
rrobetti
left a comment
There was a problem hiding this comment.
In documents/analysis/CLIENT_REACTIVE_THROTTLING_ANALYSIS.md
Unable to render rich display
Parse error on line 3:
...local limit
ceil(effectiveAdmission/
-----------------------^
Expecting 'SQE', 'DOUBLECIRCLEEND', 'PE', '-)', 'STADIUMEND', 'SUBROUTINEEND', 'PIPE', 'CYLINDEREND', 'DIAMOND_STOP', 'TAGEND', 'TRAPEND', 'INVTRAPEND', 'UNICODE_TEXT', 'TEXT', 'TAGSTART', got 'PS'
For more information, see https://docs.github.com/get-started/writing-on-github/working-with-advanced-formatting/creating-diagrams#creating-mermaid-diagrams
Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/cc3d3706-b38e-4900-85a2-a5a92ed78369 Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
Fixed in |
| spikes when load increases suddenly. | ||
|
|
||
| **Without client throttling (problem):** 50 threads across 5 app servers hit one OJP node | ||
| at once → the node's 10-slot pool is overwhelmed → 40 requests queue or timeout → the |
There was a problem hiding this comment.
@copilot this phrase makes it sound like OJP can only handle 10 connections. rephrase it.
There was a problem hiding this comment.
Updated in commit 4748fa6. I rephrased that line to make clear the 120-slot value is just an example configuration, not an OJP hard limit.
| I --> F | ||
| ``` | ||
|
|
||
| **Example 1 (normal):** |
There was a problem hiding this comment.
@copilot use higher numbers for the examples, when you talk about 1 concurrent connection, that does not show much scalability.
There was a problem hiding this comment.
Updated in commit 4748fa6. I replaced the low-scale examples with higher-concurrency numbers (for example 240 threads/12 app servers and limits like 27) to better reflect scalability.
|
|
||
| --- | ||
|
|
||
| ## Current Design: Two Configurable Modes |
There was a problem hiding this comment.
@copilot three configurable modes when including the combined mode.
There was a problem hiding this comment.
Updated in commit 4748fa6. I changed the section wording to “Three Configurable Modes” and aligned the follow-up sentence accordingly.
Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/964f72ce-6ed0-44a6-9854-141e720bbff7 Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/10ee3dad-30bb-445d-95b8-8d87d17eecd8 Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
Updated in |
|
Uh oh!
There was an error while loading. Please reload this page.