Skip to content

psmiz/MASS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

How Many Experts Are Enough? Towards Optimal Semantic Specialization for Mixture-of-Experts (AAAI 2026)

This repository contains the implementation of MASS (Mixture-of-Experts for Adaptive Semantic Specialization), a novel approach that combines Mixture of Experts (MoE) with adaptive expert expansion for both Language and Vision tasks. MASS dynamically adjusts the number of experts based on gradient dynamics, enabling efficient domain generalization and specialization.

Key Features

  • Adaptive Expert Expansion: Dynamically expands experts during training based on gradient-driven semantic drift signals
  • MinTau Routing: Novel adaptive gate mechanism that selects experts based on cumulative routing mass
  • Experiments: Specialized for both language (GLUE tasks) and vision (domain generalization) tasks

πŸ“ Repository Structure

β”œβ”€β”€ Language/                    # Language tasks (GLUE benchmarks)
β”‚   β”œβ”€β”€ search_glue_no_trainer_mass.py   # Main training script for language
β”‚   β”œβ”€β”€ moe_utils_mass.py               # MoE utilities for language models
β”‚   β”œβ”€β”€ scripts_mass/                   # Experiment scripts for GLUE tasks
β”‚   └── requirements.txt                # Language dependencies
β”œβ”€β”€ Vision/                      # Vision tasks (Domain generalization)
β”‚   β”œβ”€β”€ domainbed/                      # Domain generalization framework
β”‚   β”œβ”€β”€ β”œβ”€β”€ vision_transformers.py      
β”‚   β”‚   β”œβ”€β”€ algorithms_mass.py          # MASS algorithm implementation
β”‚   β”‚   β”œβ”€β”€ scripts/train_mass.py       # Main training script for vision
β”‚   β”‚   └── moe_utils.py                # Vision Transformer MoE conversion
β”‚   β”œβ”€β”€ scripts_mass/                   # Experiment scripts for vision datasets
β”‚   └── requirements.txt                # Vision dependencies
└── tutel/                       # Tutel MoE library with MASS extensions
    β”œβ”€β”€ tutel/gates/mintau.py              # MinTau adaptive gating mechanism
    └── tutel/impls/moe_layer_mass.py      # MASS-enabled MoE layer with expert expansion


Installation

Prerequisites

  • Python 3.8+
  • CUDA-compatible GPU
  • PyTorch 1.12+

Setup Environment

# Clone the repository
git clone <repository-url>
cd mass

# Create conda environment from environment.yml
conda env create -f environment.yml
conda activate mass

# Install Tutel library with MASS extensions
cd tutel
python setup.py clean --all
pip install -e .

Install Dependencies

For Language Tasks:

cd Language
pip install -r requirements.txt

For Vision Tasks:

cd Vision
pip install -r requirements.txt

Language Tasks (GLUE Benchmarks)

Supported Tasks

  • MNLI (Multi-Genre Natural Language Inference)
  • CoLA (Corpus of Linguistic Acceptability)
  • RTE (Recognizing Textual Entailment)
  • QNLI (Question Natural Language Inference)
  • MRPC (Microsoft Research Paraphrase Corpus)

Quick Start

# Run CoLA with MASS
bash Language/scripts_mass/cola.sh

# Run RTE with MASS
bash Language/scripts_mass/rte.sh

# Run MNLI with MASS
bash Language/scripts_mass/mnli.sh

# Run QNLI with MASS
bash Language/scripts_mass/qnli.sh

# Run MRPC with MASS
bash Language/scripts_mass/mrpc.sh

Vision Tasks (Domain Generalization)

Supported Datasets

  • PACS (Photo, Art, Cartoon, Sketch)
  • VLCS (VOC2007, LabelMe, Caltech101, SUN09)
  • OfficeHome (Art, Clipart, Product, Real)
  • TerraIncognita (Location-based terrain classification)

Datasets

python3 -m domainbed.scripts.download \
       --data_dir=./domainbed/data

Quick Start for Training

cd Vision

# Run PACS with MASS
bash scripts_mass/run_pacs.sh

# Run VLCS with MASS
bash scripts_mass/run_vlcs.sh

# Run OfficeHome with MASS
bash scripts_mass/run_office.sh

# Run TerraIncognita with MASS
bash scripts_mass/run_terra.sh

Evaluation

python3 -m domainbed.scripts.collect_results --input_dir=${output_dir}

Citation

@misc{park2025expertsenoughoptimalsemantic,
      title={How Many Experts Are Enough? Towards Optimal Semantic Specialization for Mixture-of-Experts}, 
      author={Sumin Park and Noseong Park},
      year={2025},
      eprint={2512.19765},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2512.19765}, 
}

About

Official Implementation of MASS (AAAI 2026)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors