Skip to content

stiger1000/TC-MoE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

[ICLR 2025] TC-MoE: Augmenting Mixture of Experts with Ternary Expert Choice

Hugging Face License

This repository provides the code for the paper TC-MoE: Augmenting Mixture of Experts with Ternary Expert Choice

Key Features

Ternary Expert Space
Expands expert capacity with {-1, 0, 1} multipliers at minimal computational cost

🚀 ​Efficiency Gains
Reduces activated experts by ​9% while improving average performance by ​1.1%

⚖️ ​Dynamic Load Balancing
Novel load balance loss ensures equitable expert utilization

🔧 ​Flexible Trade-offs
Reward loss mechanism for efficiency-effectiveness optimization

Main Results

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("stiger1000/TC-MoE")
tokenizer = AutoTokenizer.from_pretrained("stiger1000/TC-MoE")
inputs = tokenizer("The capital of France is", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))

Citation

@inproceedings{yan2025tcmoe,
  title={TC-MoE: Augmenting Mixture of Experts with Ternary Expert Choice},
  author={Yan, Shen and Bin, Xingyan and Zhang, Sijun and Wang, Yisen and Lin, Zhouchen},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages