-
Notifications
You must be signed in to change notification settings - Fork 469
Fix for updating seeks when reads are served using MRD #3327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #3327 +/- ##
==========================================
- Coverage 78.85% 78.69% -0.16%
==========================================
Files 131 131
Lines 17889 17999 +110
==========================================
+ Hits 14106 14164 +58
- Misses 3308 3356 +48
- Partials 475 479 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, LGTM.
One minor behavior change I see, before and after. When the file is repeatedly read completely from the same file handle. Now, every start read will increase the seek (as expectedOffset will be equal to objectSize and fresh read will start from 0 offset), which was not the case earlier.
This might affect the performance due to extra seek, so good to think a bit here.
[Edited]
- More I can see, when completely read through a reader and then a random offset will lead to a increase in seek (behavior change).
|
CC: @Tulsishah we need to incorporate this change as part of random reader refactoring. |
|
Description
This PR fixes the high read tail latencies for Rapid random reads. Issue was after x number of reads depending on the IO size, random read was getting converted to sequential read. In case of sequential reads, we fetch more data which is increasing the latency and overall latencies from p90 onwards
Seeks are not getting updated when data is served by MRD. Hence storing the nextExpectedOffset before serving the read and comparing with the next read request offset to update seeks.
New Behavior: Earlier the seeks were not updated when the reader is read completely. Now it will get updated based on the previous and current offset. Hence the number of seeks will be more compared to current flow incase of random reads. If the number of seeks increases, the amount of data fetched from GCS will be less since we calculate the amount of data to fetch based on the seek count.
Even though the seek count increased we are not seeing any issue interms of perf.
ZB perf results: https://paste.googleplex.com/6143202492678144
Regional perf results: https://paste.googleplex.com/5189908362428416
Serving test on Regional:
Requests: https://screenshot.googleplex.com/BEKuU54vzrusrmr
Latency: https://screenshot.googleplex.com/63TSXTRuR52T4Yx
Testing details
Perf results:
+--------+------------+--------------+------------+--------------+--------------+
| Branch | File Size | Read BW | Write BW | RandRead BW | RandWrite BW |
+--------+------------+--------------+------------+--------------+--------------+
| Master | 0.25MiB | 532.83MiB/s | 1.15MiB/s | 73.7MiB/s | 1.22MiB/s |
| PR | 0.25MiB | 543.23MiB/s | 1.1MiB/s | 77.27MiB/s | 1.26MiB/s |
| | | | | | |
| | | | | | |
| Master | 48.828MiB | 3673.59MiB/s | 84.67MiB/s | 1422.38MiB/s | 83.83MiB/s |
| PR | 48.828MiB | 3626.33MiB/s | 85.21MiB/s | 1494.6MiB/s | 85.5MiB/s |
| | | | | | |
| | | | | | |
| Master | 976.562MiB | 3646.39MiB/s | 77.88MiB/s | 668.82MiB/s | 30.33MiB/s |
| PR. | 976.562MiB | 3625.82MiB/s | 77.3MiB/s | 1083.05MiB/s | 30.26MiB/s |
| | | | | | |
| | | | | | |
+--------+------------+--------------+------------+--------------+--------------+