Tags: pingcap/tiflash
Tags
Created hot fix tag on behalf of huangjunshen@pingcap.com
disagg: Limit the compute node cache download IOPS and bandwidth (#10558 ) close #10557 * Prior to this PR, both the number of threads handling foreground reads and the number of background threads downloading files for FileCache were controlled by the same parameter: profiles.default.io_thread_count_scale, making it difficult to fine-tune these settings independently. This PR introduces two separate parameters to control the concurrency and queue length of FileCache downloads performed by Compute Node background threads, allowing adjustments to background thread settings without affecting the foreground read thread pool configuration: - Added profiles.default.dt_filecache_downloading_count_scale, with a default value of 2.0. This means the concurrency for FileCache background downloads is set to logical_cores profiles.default.dt_filecache_downloading_count_scale. - Changed the default value of profiles.default.dt_filecache_max_downloading_count_scale from 1.0 to 10.0. This parameter now defines the maximum queue length for FileCache background downloads as logical_cores profiles.default.dt_filecache_max_downloading_count_scale. Before this PR, the queue length is logical_cores * profiles.default.io_thread_count_scale * 2, so changing the default value is just making it not related to io_thread_count_scale any more. * FileCache downloads files from S3 to local storage, and this process is rate-limited by the rate_limiter parameter. * After persisting data received from Write Nodes into the LocalPageCache, Compute Nodes no longer invoke fsync, reducing their IOPS requirements. * Optimized the lock contention scope in `FileCache::get`. Signed-off-by: JaySon-Huang <tshent@qq.com>
disagg: Optimize S3 connection parameters to reduce error rates (#10549… …) (#10550) close #10538 * Increase storage.s3.connection_timeout_ms default value to 5000 for reduce error rate of S3 API * Add retry backoff for `S3RandomAccessFile::initialize` * Seperate metrics for compute node call fetch pages only for finish snapshot when fully hit local cache or really need to fetch pages from write node * Refine the grafana panel for better diagnosis * Add variable `additional_groupby` and `tiflash_role` Signed-off-by: JaySon-Huang <tshent@qq.com> Co-authored-by: JaySon <tshent@qq.com>
Storages: Use new GC API for fetching gc safepoint (#10525) (#10527) close #10524 Storages: Use new GC API for fetching gc safepoint * Remove deprecated function getGCSafePoint && getGCSafePointV2 * Use new GC API getGCState for fetching gc safepoint * Update image name for running integration tests Signed-off-by: JaySon-Huang <tshent@qq.com> Co-authored-by: JaySon-Huang <tshent@qq.com>
status_server: Support new probe http api `/tiflash/livez` and `/tifl… …ash/readyz` (#10506) close #10496 * Support returning non 200 response body from FFI function to status server * Support new probe http api `/tiflash/livez` and `/tiflash/readyz` * `/tiflash/livez` always return response code 200 * `/tiflash/readyz` will - return response code 200 when it is ready for serving coprocessor requests - return response code 500 when it is not ready Signed-off-by: JaySon-Huang <tshent@qq.com>
Raft: Increase max retry times to avoid too large remote requests (#1… …0301) (#10307) close #10300 Raft: Increase max retry times to avoid too large remote requests * Increase the max retry number between LearnerRead and acquiring snapshot from the storage layer by the number of query regions Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io> Signed-off-by: JaySon-Huang <tshent@qq.com> Co-authored-by: JaySon <tshent@qq.com> Co-authored-by: JaySon-Huang <tshent@qq.com>
Raft: Increase max retry times to avoid too large remote requests (#1… …0302) close #10300 Raft: Increase max retry times to avoid too large remote requests * Increase the max retry number between LearnerRead and acquiring snapshot from the storage layer by the number of query regions Signed-off-by: JaySon-Huang <tshent@qq.com>
PreviousNext