[RFC] Advanced Core Type Selection #1917

dnmokhov · 2025-11-24T15:48:13Z

Add an RFC describing setting multiple core types in task arena constraints.

Reference implementation: dev/dnmokhov/core-types

dnmokhov · 2025-11-24T16:19:43Z

@wangleis @sunxiaoxia2022, this is an RFC for adding multiple core type selection to the master branch. Feel free to provide feedback. Thanks!

rfcs/proposed/core_types/README.md

vossmjp

In general this RFC should not read like guidance on how to select which core type(s) to use based on application characteristics. The relative capabilities of core types may differ based on the HW platform and benefits will be highly application dependent. Instead, it should describe use cases with appropriate caveats that its generally better to the let the OS decide and that constraints should be applied carefully.

Co-authored-by: Mike Voss <michaelj.voss@intel.com>

…nd clarifications about not relying on core type constraints

dnmokhov · 2025-12-09T06:39:40Z

In general this RFC should not read like guidance on how to select which core type(s) to use based on application characteristics. The relative capabilities of core types may differ based on the HW platform and benefits will be highly application dependent. Instead, it should describe use cases with appropriate caveats that its generally better to the let the OS decide and that constraints should be applied carefully.

Added several clarifications w.r.t. this topic.

isaevil · 2025-12-09T09:30:21Z

rfcs/proposed/core_types/README.md

+1. **API Naming**: Is `set_core_types` (plural) sufficiently distinct from `set_core_type` (singular)?
+   - Alternative: overload the existing `set_core_type` to accept `vector<core_type_id>`
+   - Alternative: `set_acceptable_core_types` or `allow_core_types`


Is it necessary to differentiate it by name? I think that the overload alternative is quite good and user won't have to choose constantly between "singular" and "plural" functions. Even if the vector with a single element is passed, we can forward it to the old set_core_type(core_type_id) function.

aleksei-fedotov · 2025-12-11T15:13:45Z

rfcs/proposed/core_types/README.md

+**Cons:**
+- Requires creating multiple `constraints` objects for simple core type combinations
+- Vector of `constraints` instances vs. single integer field with bit-packing creates memory overhead
+- Unclear how to handle conflicting `max_concurrency` or `max_threads_per_core` across instances


A possible way of handling conflicting max_concurrency or max_threads_per_core can be interpreting each constraint instance as an independent constraint with its own max_concurrency and max_threads_per_core. Once affinity mask is determined for each constraint, the union of them is created, so the sum of max_concurrency-ies, which can be used the overall max_concurrency for result task_arena instance.

max_threads_per_core is not needed to be handled anyhow specifically as it is directly reflected in the mask of the constraint.

I asked a similar question in PR1926 but it applies here too. Assume there is a system with three core types, indexes 0, 1 and 2 and there are 4 of each kind. What if the user wants 6 slots and to use the most performant cores. From what you describe, I think they would create a constraint for index 2 and max_concurrency of 4 slots. And then another for index 1 and max_concurrency of 2? Is that right? The result would be a max_concurrency of six and a mask that includes both index 2 and index 1 core types? So if all core types are idle, the arena may populate with 4 index 1 types and 2 index 2 types, or 2 index 1 types and 4 index 2 types, right? Would one be expected over the other?

I have removed this from the cons. If we adopt this alternative, the new behavior will be part of the proposal.

@vossmjp I think in this case the only possible way to populate arena would be with 4 index 2 and 2 index 1 threads, since user has specified such constraints that result in the mask that maps threads joining the arena to 4 P-cores and 2 E-cores.

But you would need to pick which e-cores are in the mask.

Yeah, I understand your question now. It seems we can try applying the same approach we are using right now, which is "Let the OS decide". Currently, if the user specifies 2 threads as max_concurrency for the arena constrained to E-cores on the platform which has, let's say, 10 of them, the mask will be created for all these 10 e-cores, right? But only two threads will join the arena and has their affinity set to the mask.
I guess with multiple constraints, arena should apply the mask per constraint. For the example you described above, first four threads joining the arena will have affinity of the P-cores, while the last two threads will be assigned E-cores mask.

Or simply extend the current logic - do not make any preference for what cores to populate first. That is, use the united mask that will include all P-cores and all E-cores, and let the system decide on the threads migration.

rfcs/proposed/core_types/README.md

…1926)

[RFC] Advanced Core Type Selection

d9af5f4

dnmokhov added this to the 2023.0.0 milestone Nov 24, 2025

dnmokhov requested review from akukanov, aleksei-fedotov, isaevil, kboyarinov and vossmjp November 24, 2025 15:48

dnmokhov added the RFC label Nov 24, 2025

wangleis reviewed Nov 25, 2025

View reviewed changes

rfcs/proposed/core_types/README.md Show resolved Hide resolved

aleksei-fedotov reviewed Dec 5, 2025

View reviewed changes

rfcs/proposed/core_types/README.md Outdated Show resolved Hide resolved

rfcs/proposed/core_types/README.md Show resolved Hide resolved