Skip to content

Conversation

@dnmokhov
Copy link
Contributor

@dnmokhov dnmokhov commented Nov 24, 2025

Add an RFC describing setting multiple core types in task arena constraints.

Reference implementation: dev/dnmokhov/core-types

@dnmokhov
Copy link
Contributor Author

@wangleis @sunxiaoxia2022, this is an RFC for adding multiple core type selection to the master branch. Feel free to provide feedback. Thanks!

Copy link
Contributor

@vossmjp vossmjp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general this RFC should not read like guidance on how to select which core type(s) to use based on application characteristics. The relative capabilities of core types may differ based on the HW platform and benefits will be highly application dependent. Instead, it should describe use cases with appropriate caveats that its generally better to the let the OS decide and that constraints should be applied carefully.

dnmokhov and others added 2 commits December 8, 2025 19:13
Co-authored-by: Mike Voss <michaelj.voss@intel.com>
…nd clarifications about not relying on core type constraints
@dnmokhov
Copy link
Contributor Author

dnmokhov commented Dec 9, 2025

In general this RFC should not read like guidance on how to select which core type(s) to use based on application characteristics. The relative capabilities of core types may differ based on the HW platform and benefits will be highly application dependent. Instead, it should describe use cases with appropriate caveats that its generally better to the let the OS decide and that constraints should be applied carefully.

Added several clarifications w.r.t. this topic.

Comment on lines 323 to 325
1. **API Naming**: Is `set_core_types` (plural) sufficiently distinct from `set_core_type` (singular)?
- Alternative: overload the existing `set_core_type` to accept `vector<core_type_id>`
- Alternative: `set_acceptable_core_types` or `allow_core_types`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to differentiate it by name? I think that the overload alternative is quite good and user won't have to choose constantly between "singular" and "plural" functions. Even if the vector with a single element is passed, we can forward it to the old set_core_type(core_type_id) function.

**Cons:**
- Requires creating multiple `constraints` objects for simple core type combinations
- Vector of `constraints` instances vs. single integer field with bit-packing creates memory overhead
- Unclear how to handle conflicting `max_concurrency` or `max_threads_per_core` across instances
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible way of handling conflicting max_concurrency or max_threads_per_core can be interpreting each constraint instance as an independent constraint with its own max_concurrency and max_threads_per_core. Once affinity mask is determined for each constraint, the union of them is created, so the sum of max_concurrency-ies, which can be used the overall max_concurrency for result task_arena instance.

max_threads_per_core is not needed to be handled anyhow specifically as it is directly reflected in the mask of the constraint.

Copy link
Contributor

@vossmjp vossmjp Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked a similar question in PR1926 but it applies here too. Assume there is a system with three core types, indexes 0, 1 and 2 and there are 4 of each kind. What if the user wants 6 slots and to use the most performant cores. From what you describe, I think they would create a constraint for index 2 and max_concurrency of 4 slots. And then another for index 1 and max_concurrency of 2? Is that right? The result would be a max_concurrency of six and a mask that includes both index 2 and index 1 core types? So if all core types are idle, the arena may populate with 4 index 1 types and 2 index 2 types, or 2 index 1 types and 4 index 2 types, right? Would one be expected over the other?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed this from the cons. If we adopt this alternative, the new behavior will be part of the proposal.

Copy link
Contributor

@aleksei-fedotov aleksei-fedotov Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vossmjp I think in this case the only possible way to populate arena would be with 4 index 2 and 2 index 1 threads, since user has specified such constraints that result in the mask that maps threads joining the arena to 4 P-cores and 2 E-cores.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you would need to pick which e-cores are in the mask.

Copy link
Contributor

@aleksei-fedotov aleksei-fedotov Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I understand your question now. It seems we can try applying the same approach we are using right now, which is "Let the OS decide". Currently, if the user specifies 2 threads as max_concurrency for the arena constrained to E-cores on the platform which has, let's say, 10 of them, the mask will be created for all these 10 e-cores, right? But only two threads will join the arena and has their affinity set to the mask.
I guess with multiple constraints, arena should apply the mask per constraint. For the example you described above, first four threads joining the arena will have affinity of the P-cores, while the last two threads will be assigned E-cores mask.

Or simply extend the current logic - do not make any preference for what cores to populate first. That is, use the united mask that will include all P-cores and all E-cores, and let the system decide on the threads migration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants