Skip to content

go,proto: remotesapi: Add a more efficient RPC, StreamChunkLocations, to use for fetch and pull.#10918

Merged
reltuk merged 11 commits into
mainfrom
aaron/remotesapi-StreamChunkLocations
Apr 24, 2026
Merged

go,proto: remotesapi: Add a more efficient RPC, StreamChunkLocations, to use for fetch and pull.#10918
reltuk merged 11 commits into
mainfrom
aaron/remotesapi-StreamChunkLocations

Conversation

@reltuk
Copy link
Copy Markdown
Contributor

@reltuk reltuk commented Apr 21, 2026

The current streaming RPC, StreamDownloadLocations, was a straight translation of the unary RPC GetDownloadLocations. We added streaming because it found it interacted much better with TCP and HTTP/2 window scaling. At the time, the RPCs were no reworked to take advantage of the stateful nature of the stream. Since then, the pipelined ChunkFetcher machinery has been added, which makes opportunities for reuse on the individual streaming RPC even better. We also added the RefreshTableFileUrl endpoint, which decouples a client's ability to continue using previously communicated table files from its need to see them in a GetDownloadLocsResponse in particular.

StreamChunkLocations is transiting the exact same semantic payloads as StreamDownloadLocations. It's just not re-transmitting a bunch of stuff it does not need to. In particular:

  1. We transit table file URLs separately from the chunk locations. A table_file_id is assigned to a table file the first time the server tells the client about it. Then that same table_file_id is used to refer to that table file for all communicated chunk locations in all response messages on the same stream.

  2. We do not re-transit chunk hashes. The responses refer to the chunk hashes which were provided in the corresponding request by index. The client already knows them.

  3. We do not need to transit RefreshTableFileUrlRequest messages for the table files. The client can build these with its own knowledge.

Those are three major improvements for bandwidth utilization. There are also some smaller things, like sending chunk_hashes as bytes instead of repeated bytes.

This PR adds a features field in GetRepoMetadataResponse. That field lets a client know that it can call the new available endpoint. Otherwise the client continues calling StreamDownloadLocations.

This PR adds both server-side and client-side implementations for the new endpoint. The server-side implementation ends up looking a lot like the existing StreamDownloadLocations code. It keeps some local maps so it can include the appropriate reference ids in the outgoing messages. The client-side implementation intentionally remains about as minimal as possible. In particular, it does not touch range coalescing or most aspects of the fetch pipeline. It targets just generating the StreamChunkLocationsRequest and handling the StreamChunkLocationsResponse messages. It translates the responses back into what StreamDownloadLocations would have generated before handing those pieces off to the rest of the fetch pipeline.

In addition to unit tests, some machinery in remotesrv is updated so we can optionally disable advertising support for StreamChunkLocations. This allows us to update some integration tests so that they continue to exercise the StreamDownloadLocations code paths on both the client and the server.

reltuk added 11 commits April 21, 2026 14:52
StreamChunkLocations is a more efficient StreamDownloadLocations. It is available on a given remotesapi implementation if GetRepoMetadataResponse advertises FEATURE_STREAM_CHUNK_LOCATIONS in its features.
…instead of repeated bytes.

Saves ~10% on the wire.
…ations support.

This is useful for testing because client and server all need to still support StreamDownloadLocations for now.

Set DOLT_REMOTESAPI_DISABLED_FEATURES=FEATURE_STREAM_CHUNK_LOCATIONS environment variable.
…g sure to test pull/fetch on legacy StreamDownloadLocations path as well.
@coffeegoddd
Copy link
Copy Markdown
Contributor

@reltuk DOLT

read_tests from_latency to_latency percent_change
covering_index_scan 0.55 0.55 0.0
groupby_scan 9.73 9.73 0.0
index_join 1.89 1.86 -1.59
index_join_scan 1.39 1.37 -1.44
index_scan 21.89 22.28 1.78
oltp_point_select 0.26 0.25 -3.85
oltp_read_only 5.09 5.0 -1.77
select_random_points 0.51 0.51 0.0
select_random_ranges 0.56 0.56 0.0
table_scan 22.28 22.69 1.84
types_table_scan 65.65 66.84 1.81
write_tests from_latency to_latency percent_change
oltp_delete_insert 6.32 6.32 0.0
oltp_insert 3.07 3.07 0.0
oltp_read_write 11.04 11.04 0.0
oltp_update_index 3.19 3.19 0.0
oltp_update_non_index 3.02 3.02 0.0
oltp_write_only 5.99 5.99 0.0
types_delete_insert 6.91 6.79 -1.74

@coffeegoddd
Copy link
Copy Markdown
Contributor

@reltuk DOLT

comparing_percentages
100.000000 to 100.000000
version result total
5338543 ok 5937471
version total_tests
5338543 5937471
correctness_percentage
100.0

@coffeegoddd
Copy link
Copy Markdown
Contributor

@reltuk DOLT

test_name from_latency_p95 to_latency_p95 percent_change
tpcc-scale-factor-1 54.83 55.82 1.81
test_name from_server_name from_server_version from_tps to_server_name to_server_version to_tps percent_change
tpcc-scale-factor-1 dolt 11ae0d2 44.16 dolt 5338543 43.24 -2.08

@reltuk reltuk requested a review from macneale4 April 23, 2026 17:13
@reltuk
Copy link
Copy Markdown
Contributor Author

reltuk commented Apr 23, 2026

Comparative statistics for a fetch of a database with 3359049 small chunks:

  /dolt.services.remotesapi.v1alpha1.ChunkStoreService/StreamDownloadLocations
    calls=6  in_msgs=48491  out_msgs=50324
    in:  hdr=133B  payload=328.1MiB (wire 328.4MiB)  trailer=8B
    out: payload=92.2MiB (wire 92.4MiB)
  /dolt.services.remotesapi.v1alpha1.ChunkStoreService/StreamChunkLocations
    calls=2  in_msgs=53665  out_msgs=53665
    in:  hdr=13B  payload=45.0MiB (wire 45.3MiB)  trailer=4B
    out: payload=80.1MiB (wire 80.4MiB)

So in this example we save ~13% egress and ~86% ingress for the download location resolution overhead.

The database itself is about 704MB of transited chunk data.

This is for from the only overhead associated with a fetch, but it's a real win.

Copy link
Copy Markdown
Contributor

@macneale4 macneale4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@reltuk reltuk merged commit 011a27f into main Apr 24, 2026
45 of 50 checks passed
@github-actions
Copy link
Copy Markdown

@coffeegoddd DOLT

name add_cnt delete_cnt update_cnt latency
adds_only 60000 0 0 0.57
adds_updates_deletes 60000 60000 60000 2.85
deletes_only 0 60000 0 1.43
updates_only 0 0 60000 1.8

@github-actions
Copy link
Copy Markdown

@coffeegoddd DOLT

test_name detail row_cnt sorted mysql_time sql_mult cli_mult
batching LOAD DATA 10000 1 0.06 1.67
batching batch sql 10000 1 0.07 1.86
batching by line sql 10000 1 0.07 1.71
blob 1 blob 200000 1 0.92 3.98 3.98
blob 2 blobs 200000 1 0.9 4.53 4.44
blob no blob 200000 1 0.92 2.7 2.46
col type datetime 200000 1 0.86 2.6 2.35
col type varchar 200000 1 0.72 3.49 3.17
config width 2 cols 200000 1 0.81 2.59 2.36
config width 32 cols 200000 1 1.96 3 3.32
config width 8 cols 200000 1 1.05 2.63 3.19
pk type float 200000 1 0.88 2.49 2.24
pk type int 200000 1 0.84 2.48 2.27
pk type varchar 200000 1 1.71 1.51 1.36
row count 1.6mm 1600000 1 6 2.86 2.56
row count 400k 400000 1 1.49 2.81 2.57
row count 800k 800000 1 2.96 2.88 2.61
secondary index four index 200000 1 3.75 1.25 1.05
secondary index no secondary 200000 1 0.91 2.77 2.51
secondary index one index 200000 1 1.2 2.41 2.17
secondary index two index 200000 1 2.08 1.68 1.46
sorting shuffled 1mm 1000000 0 5.39 2.59 2.36
sorting sorted 1mm 1000000 1 5.3 2.64 2.4

@github-actions
Copy link
Copy Markdown

@coffeegoddd DOLT

name detail mean_mult
dolt_blame_basic system table 1.19
dolt_blame_commit_filter system table 1.18
dolt_commit_ancestors_commit_filter system table 0.63
dolt_commits_commit_filter system table 1.11
dolt_diff_log_join_from_commit system table 2.96
dolt_diff_log_join_to_commit system table 2.98
dolt_diff_table_from_commit_filter system table 1.24
dolt_diff_table_to_commit_filter system table 1.21
dolt_diffs_commit_filter system table 1.06
dolt_history_commit_filter system table 1.24
dolt_log_commit_filter system table 1.17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants