-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Enable Inter-Pipeline Parallelism #770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…tor to allow executor to work exclusively on its own query
…ptional interface ParallelScanInfo that allows them to emit OperatorTaskInfo objects. For every object they emit a task is created. The OperatorTaskInfo is then accessible within the GetChunk method through the TaskContext object. This allows all different scan types to be partitioned, including e.g. scanning an aggregate HT or reading a parquet/CSV file.
…be finished twice
… querygraph of a specific tree in the log file
…RING_AGG or LIST)
…o be launched for every vector, and run this on every query with EnableQueryVerification enabled
…spans past the vector size
|
I looked through the commit history of the 0.2.0 release, and this is not included, correct? |
|
Not yet no :) We wanted to give it some time to cool down before putting it in a release, as it is a pretty big change to the execution engine. |
|
We appreciate your focus on reliability and your caution here! Still very excited about this. I assume you're thinking of having a benchmarking bonanza after you implement it? :-) |
|
We want to do a few more rounds of optimizations before starting the benchmarking bonanza :) But this is indeed on the agenda. |
This PR enables inter-pipeline parallelism, allowing for parallel execution within one pipeline by splitting up the scans and creating individual tasks. Data sources can now implement an optional method
ParallelScanInfothat allows them to specify how the scan should be split up into separate sources.The data sources can inherit from the
OperatorTaskInfoto provide per-task information on how to scan the data source. For an example of this see the code inphysical_table_scan.cpp. Currently this splitting up is only done for the normal table scans, and the table is split up into chunks of 100 vectors (102.400 tuples).Sinks also need to explicitly implement parallelism for this to function. For now, parallelism has only been implemented in the PhysicalHashJoin, PhysicalHashAggregate and PhysicalSimpleAggregate. Note that the parallelism in the PhysicalHashAggregate is currently very poor, involving a simple global lock over the table. This will be fixed in the future.
Other sinks also need to be extended to support parallel execution. I think the following sinks are the most interesting for now: ORDER BY, WINDOW, INSERT, DELETE, UPDATE, TOP N and the other joins (Blockwise NLJ join, piecewise merge join).
Current TPC-H Results
To continue the trend, this PR results in the following parallel speed-up for TPC-H:
Progress is made, but more work to be done :) Particularly in the aggregate HTs.