feat: modernize dependencies; add Qwen3; refactor attention/rope; add alignment tests by mikecovlee · Pull Request #36 · TUDB-Labs/MoE-PEFT

mikecovlee · 2026-01-06T02:28:27Z

Summary

This PR modernizes MoE-PEFT by updating core dependencies, improving attention/rope/cache compatibility, expanding model support (incl. Qwen3), and enhancing MixLoRA routing with entropy-regularized loss. It also includes a broader set of alignment/regression tests to validate FlashAttention vs eager behavior across multiple model families.

Note: This PR contains both feature additions and refactors/bugfixes (attention, RoPE, cache, naming, and CI/compatibility tweaks).

Key changes

Dependencies
- Bump project version to 2.0.3
- Update and pin key deps (e.g. torch/transformers/peft)
- Raise Python requirement to >= 3.11
New / improved model support
- Add Qwen2/Qwen3 integration (new modeling_qwen.py and registry updates)
Attention refactor
- Unify attention entry points via ATTENTION_FUNCTIONS (eager / flash_attn)
- Improve mask handling, kv-repeat logic, dtype handling, and sliding-window behavior
RoPE improvements
- Extend RoPE scaling/init support (e.g. linear/dynamic/yarn/longrope) and dynamic updates
MixLoRA routing loss enhancement
- Add entropy-regularized router loss (Tsallis / Rényi) and expose new config knobs
- Keep load-balance loss and combine with entropy loss for more stable routing
Cache / generation robustness
- Harden cache reorder/batch selection logic and improve dtype/device consistency
Quality / compatibility
- Remove unnecessary trust_remote_code=True usages
- Fix widespread typo: casual → causal
Tests
- Add alignment tests for multiple model families (FlashAttention vs eager)
- Add ChatGLM test scaffolding

Why

Keep MoE-PEFT compatible with newer upstream ecosystems (torch/transformers/peft)
Reduce attention and RoPE implementation divergence across model backends
Improve MixLoRA routing stability through entropy regularization
Increase confidence with broader regression/alignment coverage

Checklist

CI passed
New tests added/updated where appropriate
Verified inference/training for representative models (e.g., Llama/Gemma2/Qwen2/Qwen3)
Backward-compat considerations documented (Python>=3.11, pinned deps)

Notes / Potential breaking changes

Python requirement is now >=3.11
Some dependencies are now pinned to specific versions (torch/transformers/peft)

…ions

* update_0121 * Add tests for alignment of various models with flash attention and eager implementations - Introduced new test files for alignment testing of models including Gemma2, Phi, Phi3, Llama, Qwen2, and Mistral. - Implemented tests for flash attention forward pass in `test_alignment_flash_attn.py`. - Added eager path tests for Gemma2 and Phi models in `test_alignment_gemma2_eager.py` and `test_alignment_gemma_phi.py`. - Created alignment tests for Llama and Qwen2 models in `test_alignment_llama_qwen2.py`. - Included tests for Mistral and Phi3 models in `test_alignment_mistral_phi3.py`. - Each test verifies model configuration, initialization, and output shapes, ensuring proper integration with the PEFT framework. * Fix code quality issues: typos, documentation, unused imports, and code organization (#5) * Initial plan * Fix review comments: typos, comments, and code quality issues Co-authored-by: mikecovlee <16332179+mikecovlee@users.noreply.github.com> * Fix CI: Make qwen3 import conditional to support older transformers versions Co-authored-by: mikecovlee <16332179+mikecovlee@users.noreply.github.com> * Remove Qwen code from modeling_mistral.py and fix all casual→causal typos Co-authored-by: mikecovlee <16332179+mikecovlee@users.noreply.github.com> * Fix bare except clause in launch.py Co-authored-by: mikecovlee <16332179+mikecovlee@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: mikecovlee <16332179+mikecovlee@users.noreply.github.com> * [fix] update torch and transformers dependencies to specific versions * [refactor] update Qwen model imports and improve softcap parameter formatting --------- Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: mikecovlee <16332179+mikecovlee@users.noreply.github.com>

* mixlora: add entropy-regularized router loss (Tsallis/Rényi) from v2.1.0 without destructive changes\n\n- Config: add router_dyn_loss_coef, entropy_* params\n- Router loss: combine entropy and load-balance losses\n- Common: add tsallis/renyi/shannon entropy utilities and export * Address review feedback: fix device/dtype mismatches, input mutation, and add tests (#9) * refactor: remove unused entropy_eps parameter and related assertions in MixLoraConfig --------- Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>

* Added support for cache locations, optimized generation logic, and fixed the calculation of cumulative steps in the training configuration. * Fixed the label dimension issue when calculating metrics * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update flash-attn to 2.8.3 * Update bitsandbytes to version 0.49.1 and fix the update logic for kv_seq_len to avoid invalid resizing. * Now flash attention can work with llama * Fix Qwen3 support * Replace entry point to `python -m moe_peft` * Fix tests * Update README --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

mikecovlee and others added 2 commits January 6, 2026 10:26

[refactor] remove trust_remote_code flags and update model configurat…

513174f

…ions

mikecovlee marked this pull request as ready for review January 7, 2026 07:02

mikecovlee changed the title ~~[WIP] Update with latest dependencies, support new models~~ Update with latest dependencies, support new models Jan 7, 2026

mikecovlee changed the title ~~Update with latest dependencies, support new models~~ [features] Update with latest dependencies, support new models Jan 7, 2026

mikecovlee and others added 2 commits January 7, 2026 15:54

Add tests (#7)

479debd

mikecovlee changed the title ~~[features] Update with latest dependencies, support new models~~ feat: modernize dependencies; add Qwen3; refactor attention/rope; add alignment tests Jan 8, 2026

mikecovlee and others added 2 commits January 13, 2026 11:15

Update version to 2.1.0

50c774a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: modernize dependencies; add Qwen3; refactor attention/rope; add alignment tests#36

feat: modernize dependencies; add Qwen3; refactor attention/rope; add alignment tests#36
mikecovlee wants to merge 6 commits into
TUDB-Labs:mainfrom
scu-covariant:main

mikecovlee commented Jan 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mikecovlee commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key changes

Why

Checklist

Notes / Potential breaking changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mikecovlee commented Jan 6, 2026 •

edited

Loading