Add throttling for incoming search requests, add additional prometheus metrics covering search responses#1608
Merged
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds throttling capabilities for incoming search requests to prevent memory exhaustion under high load, along with comprehensive Prometheus metrics for monitoring search performance. The changes introduce configurable concurrency limits, circuit breaker thresholds, and response file limits.
Changes:
- Added throttling options with semaphore-based concurrency control and circuit breaker for incoming search requests
- Introduced detailed Prometheus metrics tracking search request rates, response times, queue depth, and drop rates
- Extended search interfaces to support optional row limits for database queries
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| src/slskd/Telemetry/Metrics.cs | Restructured metrics into nested Incoming, Filter, and Query classes with new counters, gauges, and histograms for comprehensive search monitoring |
| src/slskd/Shares/SqliteShareRepository.cs | Added optional limit parameter to Search method for SQL LIMIT clause support |
| src/slskd/Shares/ShareService.cs | Added optional limit parameter to SearchAsync and passed through to repositories |
| src/slskd/Shares/IShareService.cs | Updated interface to include optional limit parameter |
| src/slskd/Shares/IShareRepository.cs | Updated interface to include optional limit parameter |
| src/slskd/Core/State.cs | Added HealthState with nested structure for tracking search health metrics (latency, queue depth, drop rate) |
| src/slskd/Core/Options.cs | Added ThrottlingOptions with nested classes for configuring search concurrency, circuit breaker, and response file limits |
| src/slskd/Core/Clock.cs | Added EveryThirtySeconds timer for health state updates |
| src/slskd/Application.cs | Implemented semaphore-based throttling, circuit breaker logic, and detailed metric tracking in SearchResponseResolver |
| docs/config.md | Added documentation for new throttling options with configuration table and YAML examples |
| config/slskd.example.yml | Added commented example configuration for throttling options |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Throttling
Warning
Application Stability Risk: The options in this section control the behavior of the application at the limits of the hosting environment's performance. Increasing values can result in unintended behavior and crashes. Support for users that increase these values will be limited.
slskd is a multi-threaded application and the logic is highly concurrent, but is mostly bound by disk and network I/O rather than CPU.
The
concurrencyoption controls how many threads (.NETTasks to be specific) can perform work at the same time, limiting I/O contention. Raising the number can improve throughput on capable machines but defaults are generally set in such a way that improvements will be negligible. Lowering the number will reduce throughput but alleviate I/O contention and improve stability on lower-spec hardware.Lower values of
concurrencyor slow processing of individual messages may cause the application to process messages at a slower rate than they are received, causing messages to be 'backed up' in internal queues. Thecircuit_breakeroption controls the number of messages that can be enqueued. Once the circuit breaker is hit, the application will drop or discard messages without processing them.The latency, queue depth and rate at which messages are being dropped can be monitored by reviewing the application state or Prometheus metrics.
--throttling-search-incoming-concurrencySLSKD_THROTTING_SEARCH_INCOMING_CONCURRENCY--throttling-search-incoming-circuit-breakerSLSKD_THROTTING_SEARCH_INCOMING_CIRCUIT_BREAKER--throttling-search-incoming-response-file-limitSLSKD_THROTTING_SEARCH_INCOMING_RESPONSE_FILE_LIMITYAML
Additional Metrics
Search Requests
slskd_search_incoming_requests_received- Total number of search requests receivedslskd_search_incoming_request_receive_rate_current- Number of search requests received in the last minuteslskd_search_incoming_requests_dropped- Total number of search requests dropped due to processing pressureslskd_search_incoming_request_drop_rate_current- Number of search requests dropped in the last minuteslskd_search_incoming_request_queue_depth_current- The number of incoming search requests waiting to be processedSearch Responses
slskd_search_incoming_responses_sent- Total number of search responses sentslskd_search_incoming_response_send_rate_current- Number of search responses sent in the last minuteSearch Response Latency
The time it is taking to respond to incoming search requests. The ideal rate is something under ~30ms, or dropped requests become likely.
The
response_latencymetric is a measure of the total time, from receipt of the request until the response is returned. The remaining metrics measure different components of the overall latency.slskd_search_incoming_response_latency- The time taken to resolve and return a response to an incoming search request, in millisecondsslskd_search_incoming_response_latency_current- The average time taken to resolve and return a response to an incoming search request, in millisecondsslskd_search_incoming_filter_latency- The time taken to apply filters to an incoming search request, in millisecondsslskd_search_incoming_filter_latency_current- The average time taken to apply filters to an incoming search request, in millisecondsslskd_search_incoming_query_latency- The time taken to query share database(s) for results, in millisecondsslskd_search_incoming_query_latency_current- The average time taken to query share database(s) for results, in milliseconds