Skip to content

Features/refactor2#1

Open
liutongxuan wants to merge 28 commits into
mainfrom
features/refactor2
Open

Features/refactor2#1
liutongxuan wants to merge 28 commits into
mainfrom
features/refactor2

Conversation

@liutongxuan

Copy link
Copy Markdown
Owner

No description provided.

Introduce LLM/VLM/DIT/REC model-input param group types plus a ModelInput wrapper and ModelInputFactory to build model-specific payload partitions from legacy ModelInputParams without modifying the legacy struct.
Add default create_model_input hooks to CausalLM, CausalVLM, and DiTModel so model-specific input partitions are created by model type, and add tests that verify these class-level creation paths.
Keep CausalLM::create_model_input focused on LLM input only, and let RecCausalLM attach rec payload explicitly to match the model abstraction split.
Switch ModelInput and ModelInputParamBundle partitions to optional value semantics to avoid shared_ptr allocations, and standardize conversion helpers around make_*_from_legacy APIs while keeping model-specific input assembly behavior unchanged.

Made-with: Cursor
Introduce rvalue overloads for model-input and param-bundle legacy conversion helpers so callers can transfer large tensors/vectors with std::move and reduce copy overhead in hot conversion paths.
Unify executor and graph fallback flows on typed ModelInput while preserving legacy compatibility adapters. Add ACL/MLU/CUDA regression coverage to assert typed run output matches legacy run behavior.

Made-with: Cursor
Add rvalue create_model_input overloads on model base classes and route VLM executor's post-processing path through model-level typed input creation to reduce redundant legacy conversions. Extend model_input_factory tests to keep moved-params behavior aligned with existing semantics.

Made-with: Cursor
Add rvalue overloads for apply_*_to_legacy and ModelInput run/forward entries across model and runtime layers, and route BaseExecutorImpl/VlmExecutorImpl to move-based typed-to-legacy adapters. Extend factory tests to cover the new move-based apply path.

Made-with: Cursor
Hoist the typed-forward helper duplicated in acl/mlu/cuda graph executors into a single shared inline definition under runtime/executor_impl.h, and add an rvalue overload to enable move-based typed forward at fallback paths.

Made-with: Cursor
Add has_typed_forward traits and route CausalLMImpl/CausalVLMImpl/RecCausalLMImpl typed forward overloads to the underlying model when it implements them, so models can opt into typed ModelInput forward without changing the runtime adapters. Cover the trait detection with compile-time assertions.

Made-with: Cursor
Add typed ModelInput forward overloads on LlmForCausalLMImplBase (common and NPU variants). The body currently unwraps the LLM partition into a legacy ModelInputParams as a Step 3 entry point, which lets CausalLMImpl skip its base typed-to-legacy adapter and dispatch directly into the LLM base for future deeper migration.

Made-with: Cursor
Mirror the LlmForCausalLMImplBase typed forward entry on RecForCausalLMImplBase, Qwen3HybridForCausalLMImplBase, and the NPU MtpForCausalLMImplBase so that Rec/Hybrid/MTP families all opt into typed forward dispatch and let CausalLMImpl/RecCausalLMImpl skip the base typed-to-legacy adapter.

Made-with: Cursor
Replace the ModelInputParamBundle relay inside apply_model_input_to_legacy and make_model_input_from_legacy with direct per-partition apply/move calls. Saves one optional copy of all four partitions per typed-to-legacy or legacy-to-typed conversion on the hot path.

Made-with: Cursor
Remove the forward_with_typed_input helper used by graph eager fallbacks and stop pre-converting params to typed input inside BaseExecutorImpl::run(legacy). The model chain still consumes ModelInputParams natively, so the round trip only added one allocation, one apply_to_legacy and one extra virtual dispatch per call. Typed run/forward entries remain intact for callers that explicitly construct ModelInput.

Made-with: Cursor
Acl/Mlu/Cuda graph executors override run(const ModelInput&) and run(ModelInput&&) with the same body as the ExecutorImpl base default (apply_to_legacy + virtual dispatch into run(legacy)). Remove those duplicate overrides and rely on the base default; the typed entry behavior stays identical and the per-executor surface shrinks.

Made-with: Cursor
ModelInputFactory was a pass-through wrapper around make_model_input_from_legacy / apply_model_input_to_legacy and the per-model create_model_input methods. Production code has migrated to the direct APIs, leaving the wrapper used only by its own tests. Remove the header/source, fold the still-relevant assertions into the renamed model_input_test.cpp, and drop the redundant Create-For-* coverage already exercised by the model-class tests.

Made-with: Cursor
Add CHECK(input.llm.has_value()) to the typed forward overrides on LlmForCausalLMImplBase (common and NPU), RecForCausalLMImplBase, Qwen3HybridForCausalLMImplBase, and the NPU MtpForCausalLMImplBase. This is the first real read of a typed partition inside the model bases; misconstructed ModelInput values now fail fast at the LLM-family entry instead of producing default-valued legacy params downstream.

Made-with: Cursor
Replace apply_model_input_to_legacy with explicit per-partition apply calls inside the typed forward bodies of LlmForCausalLMImplBase (common and NPU), Qwen3HybridForCausalLMImplBase, the NPU MtpForCausalLMImplBase, and RecForCausalLMImplBase. Each family now applies only the partitions it actually consumes (LLM always, VLM/Rec when relevant) and skips the DiT branch entirely.

Made-with: Cursor
Add a header comment on ModelInput that documents the typed/legacy coexistence, how trait-based dispatch picks up model typed forward overrides, and the minimal opt-in step for new models.

Made-with: Cursor
Promote the CausalLMImpl typed forward dispatch coverage from compile-time static_asserts to runtime tests by giving the helper holders real backing impls and verifying that const& and && typed forward overloads on the wrapper hit the model's typed forward when supported, and fall back to the legacy path otherwise.

Made-with: Cursor
Add typed ModelInput forward overloads to every VL conditional generation impl
(Qwen2/2.5/3 VL, Qwen3 VL Moe, Oxygen VLM, GLM-4V/4V-Moe, MiniCPMV across
common and NPU paths). Pure delegate models pass typed input straight to the
opted-in language model, while Qwen3 VL variants lower the typed input into a
legacy ModelInputParams to preserve their get_deep_stacks pre-processing. MM
embedding specializations inherit the typed forward through the
CausalVLM/CausalLM chain and need no change.

Made-with: Cursor
Construct ModelInput from the legacy ModelInputParams at the worker boundary
and call model_executor_->forward with the typed entry. Existing pre-forward
mutations (e.g. layer_synchronizer in PUSH mode for LLMWorkerImpl) still run
before the typed conversion, so behavior is preserved while production now
exercises the typed dispatch chain end-to-end for these worker types.

Made-with: Cursor
Migrate the six runtime_.executor->forward call sites in RecWorkerImpl (single
forward, encoder/decoder prefill split, decode-only path, and the multi-round
loop) to construct ModelInput at the boundary and call the typed forward
overload. The multi-round loop keeps a const-ref copy because
prepare_round_input_* mutates mutable_input.input_params for the next
iteration; other sites move from local ModelInputParams since they are
single-use.

Made-with: Cursor
Make typed forward(ModelInput const&) and forward(ModelInput&&) the canonical pure-virtual entries on CausalLM/RecCausalLM/CausalVLM, while demoting the legacy ModelInputParams overload to a non-virtual compatibility wrapper.

Made-with: Cursor
Promote run(const ModelInput&) and run(ModelInput&&) to the only virtual entrypoints on ExecutorImpl and downgrade ModelInputParams overloads to non-virtual compatibility wrappers in ExecutorImpl/Executor.

Made-with: Cursor
Make AttentionMetadataBuilder and dp_utils consume LLMModelInputParams as their primary interface, while keeping legacy ModelInputParams overloads as compatibility wrappers that lower through make_llm_model_input_params_from_legacy.

Made-with: Cursor
Switch Qwen2DecoderLayer, Qwen3MoeDecoderLayer, and FusedMoE (common/MLU/ILU) to consume LLMModelInputParams as the primary forward-path input while keeping ModelInputParams overloads as compatibility adapters.

Made-with: Cursor
… paths.

Continue replacing legacy ModelInputParams access in workers, graph executors, and model forward adapters so typed ModelInput becomes the primary runtime contract while keeping compatibility boundaries localized.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant