[ICLR 2025] TC-MoE: Augmenting Mixture of Experts with Ternary Expert Choice

This repository provides the code for the paper TC-MoE: Augmenting Mixture of Experts with Ternary Expert Choice

Key Features

✅ Ternary Expert Space
Expands expert capacity with {-1, 0, 1} multipliers at minimal computational cost

🚀 Efficiency Gains
Reduces activated experts by 9% while improving average performance by 1.1%

⚖️ Dynamic Load Balancing
Novel load balance loss ensures equitable expert utilization

🔧 Flexible Trade-offs
Reward loss mechanism for efficiency-effectiveness optimization

Main Results

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("stiger1000/TC-MoE")
tokenizer = AutoTokenizer.from_pretrained("stiger1000/TC-MoE")
inputs = tokenizer("The capital of France is", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))

Citation

@inproceedings{yan2025tcmoe,
  title={TC-MoE: Augmenting Mixture of Experts with Ternary Expert Choice},
  author={Yan, Shen and Bin, Xingyan and Zhang, Sijun and Wang, Yisen and Lin, Zhouchen},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
TC-MoE		TC-MoE
figures		figures
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[ICLR 2025] TC-MoE: Augmenting Mixture of Experts with Ternary Expert Choice

Key Features

Main Results

Usage

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

[ICLR 2025] TC-MoE: Augmenting Mixture of Experts with Ternary Expert Choice

Key Features

Main Results

Usage

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages