Skip to content

Conversation

@vadlakondaswetha
Copy link
Collaborator

@vadlakondaswetha vadlakondaswetha commented May 20, 2025

Description

This PR fixes the high read tail latencies for Rapid random reads. Issue was after x number of reads depending on the IO size, random read was getting converted to sequential read. In case of sequential reads, we fetch more data which is increasing the latency and overall latencies from p90 onwards

Seeks are not getting updated when data is served by MRD. Hence storing the nextExpectedOffset before serving the read and comparing with the next read request offset to update seeks.

New Behavior: Earlier the seeks were not updated when the reader is read completely. Now it will get updated based on the previous and current offset. Hence the number of seeks will be more compared to current flow incase of random reads. If the number of seeks increases, the amount of data fetched from GCS will be less since we calculate the amount of data to fetch based on the seek count.

Even though the seek count increased we are not seeing any issue interms of perf.
ZB perf results: https://paste.googleplex.com/6143202492678144
Regional perf results: https://paste.googleplex.com/5189908362428416
Serving test on Regional:
Requests: https://screenshot.googleplex.com/BEKuU54vzrusrmr
Latency: https://screenshot.googleplex.com/63TSXTRuR52T4Yx

Testing details

  1. Manual - NA
  2. Unit tests - NA
  3. Integration tests - NA

Perf results:
+--------+------------+--------------+------------+--------------+--------------+
| Branch | File Size | Read BW | Write BW | RandRead BW | RandWrite BW |
+--------+------------+--------------+------------+--------------+--------------+
| Master | 0.25MiB | 532.83MiB/s | 1.15MiB/s | 73.7MiB/s | 1.22MiB/s |
| PR | 0.25MiB | 543.23MiB/s | 1.1MiB/s | 77.27MiB/s | 1.26MiB/s |
| | | | | | |
| | | | | | |
| Master | 48.828MiB | 3673.59MiB/s | 84.67MiB/s | 1422.38MiB/s | 83.83MiB/s |
| PR | 48.828MiB | 3626.33MiB/s | 85.21MiB/s | 1494.6MiB/s | 85.5MiB/s |
| | | | | | |
| | | | | | |
| Master | 976.562MiB | 3646.39MiB/s | 77.88MiB/s | 668.82MiB/s | 30.33MiB/s |
| PR. | 976.562MiB | 3625.82MiB/s | 77.3MiB/s | 1083.05MiB/s | 30.26MiB/s |
| | | | | | |
| | | | | | |
+--------+------------+--------------+------------+--------------+--------------+

@vadlakondaswetha vadlakondaswetha requested a review from a team as a code owner May 20, 2025 08:15
@vadlakondaswetha vadlakondaswetha added the execute-perf-test Execute performance test in PR label May 20, 2025
@kislaykishore kislaykishore requested a review from a team May 20, 2025 08:16
@codecov
Copy link

codecov bot commented May 20, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.69%. Comparing base (8fdbfff) to head (63ac150).
Report is 40 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3327      +/-   ##
==========================================
- Coverage   78.85%   78.69%   -0.16%     
==========================================
  Files         131      131              
  Lines       17889    17999     +110     
==========================================
+ Hits        14106    14164      +58     
- Misses       3308     3356      +48     
- Partials      475      479       +4     
Flag Coverage Δ
unittests 78.69% <100.00%> (-0.16%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@vadlakondaswetha vadlakondaswetha requested review from abhishek10004 and raj-prince and removed request for a team and kislaykishore May 20, 2025 10:06
@vadlakondaswetha vadlakondaswetha changed the title [Dont review]Fixoffset Fix for updating seeks when reads are served using MRD May 20, 2025
Copy link
Collaborator

@raj-prince raj-prince left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM.

One minor behavior change I see, before and after. When the file is repeatedly read completely from the same file handle. Now, every start read will increase the seek (as expectedOffset will be equal to objectSize and fresh read will start from 0 offset), which was not the case earlier.

This might affect the performance due to extra seek, so good to think a bit here.

[Edited]

  • More I can see, when completely read through a reader and then a random offset will lead to a increase in seek (behavior change).

@raj-prince
Copy link
Collaborator

CC: @Tulsishah we need to incorporate this change as part of random reader refactoring.

@vadlakondaswetha
Copy link
Collaborator Author

vadlakondaswetha commented May 22, 2025

Overall, LGTM.

One minor behavior change I see, before and after. When the file is repeatedly read completely from the same file handle. Now, every start read will increase the seek (as expectedOffset will be equal to objectSize and fresh read will start from 0 offset), which was not the case earlier.

This might affect the performance due to extra seek, so good to think a bit here.

[Edited]

  • More I can see, when completely read through a reader and then a random offset will lead to a increase in seek (behavior change).
  1. replied here: Fix for updating seeks when reads are served using MRD #3327 (comment)
  2. Agreed its a behaviour change. IMO Seeks (sequential vs random) shouldn't be updated based on whether the reader is completely read or not.

@raj-prince raj-prince self-requested a review May 27, 2025 05:41
@vadlakondaswetha vadlakondaswetha merged commit 49726fd into master May 27, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

execute-perf-test Execute performance test in PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants