How Many Experts Are Enough? Towards Optimal Semantic Specialization for Mixture-of-Experts (AAAI 2026)
This repository contains the implementation of MASS (Mixture-of-Experts for Adaptive Semantic Specialization), a novel approach that combines Mixture of Experts (MoE) with adaptive expert expansion for both Language and Vision tasks. MASS dynamically adjusts the number of experts based on gradient dynamics, enabling efficient domain generalization and specialization.
- Adaptive Expert Expansion: Dynamically expands experts during training based on gradient-driven semantic drift signals
- MinTau Routing: Novel adaptive gate mechanism that selects experts based on cumulative routing mass
- Experiments: Specialized for both language (GLUE tasks) and vision (domain generalization) tasks
βββ Language/ # Language tasks (GLUE benchmarks)
β βββ search_glue_no_trainer_mass.py # Main training script for language
β βββ moe_utils_mass.py # MoE utilities for language models
β βββ scripts_mass/ # Experiment scripts for GLUE tasks
β βββ requirements.txt # Language dependencies
βββ Vision/ # Vision tasks (Domain generalization)
β βββ domainbed/ # Domain generalization framework
β βββ βββ vision_transformers.py
β β βββ algorithms_mass.py # MASS algorithm implementation
β β βββ scripts/train_mass.py # Main training script for vision
β β βββ moe_utils.py # Vision Transformer MoE conversion
β βββ scripts_mass/ # Experiment scripts for vision datasets
β βββ requirements.txt # Vision dependencies
βββ tutel/ # Tutel MoE library with MASS extensions
βββ tutel/gates/mintau.py # MinTau adaptive gating mechanism
βββ tutel/impls/moe_layer_mass.py # MASS-enabled MoE layer with expert expansion
- Python 3.8+
- CUDA-compatible GPU
- PyTorch 1.12+
# Clone the repository
git clone <repository-url>
cd mass
# Create conda environment from environment.yml
conda env create -f environment.yml
conda activate mass
# Install Tutel library with MASS extensions
cd tutel
python setup.py clean --all
pip install -e .For Language Tasks:
cd Language
pip install -r requirements.txtFor Vision Tasks:
cd Vision
pip install -r requirements.txt- MNLI (Multi-Genre Natural Language Inference)
- CoLA (Corpus of Linguistic Acceptability)
- RTE (Recognizing Textual Entailment)
- QNLI (Question Natural Language Inference)
- MRPC (Microsoft Research Paraphrase Corpus)
# Run CoLA with MASS
bash Language/scripts_mass/cola.sh
# Run RTE with MASS
bash Language/scripts_mass/rte.sh
# Run MNLI with MASS
bash Language/scripts_mass/mnli.sh
# Run QNLI with MASS
bash Language/scripts_mass/qnli.sh
# Run MRPC with MASS
bash Language/scripts_mass/mrpc.sh- PACS (Photo, Art, Cartoon, Sketch)
- VLCS (VOC2007, LabelMe, Caltech101, SUN09)
- OfficeHome (Art, Clipart, Product, Real)
- TerraIncognita (Location-based terrain classification)
python3 -m domainbed.scripts.download \
--data_dir=./domainbed/datacd Vision
# Run PACS with MASS
bash scripts_mass/run_pacs.sh
# Run VLCS with MASS
bash scripts_mass/run_vlcs.sh
# Run OfficeHome with MASS
bash scripts_mass/run_office.sh
# Run TerraIncognita with MASS
bash scripts_mass/run_terra.shpython3 -m domainbed.scripts.collect_results --input_dir=${output_dir}@misc{park2025expertsenoughoptimalsemantic,
title={How Many Experts Are Enough? Towards Optimal Semantic Specialization for Mixture-of-Experts},
author={Sumin Park and Noseong Park},
year={2025},
eprint={2512.19765},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2512.19765},
}